PostgreSQL Server Configuration
See the PostgreSQL documentation for more information about how to change the PostgreSQL configuration in general.
Performance tuning
The default configuration of PostgreSQL uses system resource very sparingly. To improve the performance of the ANNIS service it is necessary to change a few settings in the PostgreSQL configuration file postgresql.conf
as shown in the excerpt below.
Most of the options below are commented out in the postgresql.conf
file. This means that PostgreSQL will use the default value, i.e. the value as it appears in the postgresql.conf
file, for this option. To make your changes take effect you have to uncomment it.
The values below are for machine with 2 GB RAM that is exclusively dedicated to running PostgreSQL. If you're running ANNIS on your local machine and don't have large corpora, you should use lower values as explained in the comments.
Changes in postgresql.conf
:
max_connections = 100 # expected maximum number of connections (users) at peak load
shared_buffers = 512MB # RAM cache shared across all sessions
# 25% of available RAM
# use lower value on a desktop system, i.e. 128MB or 256MB
work_mem = 128MB # RAM for *one* sort, aggregate or hash operation inside a query plan
# RAM / (2 x max_connections)
# (many operations can run in parallel!)
# increase for large corpora, i.e. 256MB, 512MB
# decrease if you have many users
maintenance_work_mem = 256MB # RAM for maintenance operations during corpus import
# CREATE INDEX, VACUUM etc.
# increase for large corpora
effective_cache_size = 1536MB # estimated size of disk cached used by the OS
# 75% of available RAM
# use lower value on a desktop system, where other applications are running
# e.g. 512MB for a desktop with 2 GB RAM
default_statistcs_target = 100 # size of value histogram for each table column
# use maximum value
checkpoint_segments = 20 # affects how quickly buffers are written
# to disk inside a transaction
autovacuum = off # VACUUM is done automatically during corpus import
More information on these settings can be found in the PostgreSQL manual:
Logging
If you want to log the duration of SQL statements you should also set the following options in postgresql.conf
:
log_min_duration_statement = 0
Remote access
If the PostgreSQL server runs on a separate machine, remote access has to be enabled.
Changes in postgresql.conf
:
listen_adresses = 'localhost,192.168.1.1'
Where 192.168.1.1 is the IP address of the machine running PostgreSQL.
Changes in pg_hba.conf
:
host annis_db annis_user 192.168.1.2/0 md5
Where 192.168.1.2 is the machine running the ANNIS service that is connecting to the remote PostgreSQL server.
Configuration of System Resources
PostgreSQL needs to access large areas of continuous RAM which can easily exceed the maximum size allowed by the operating system. PostgreSQL will check the OS resource settings during startup and exit with an error if they are not adequate.
Reproduced below are the commands to change the resource settings on Linux and OS X. More information can be found in the PostgreSQL manual: [Managing Kernel Resources](http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html 17.4.).
Linux
sysctl -w kernel.shmmax=536870912 # bytes; corresponds to 512MB
This command takes effect immediately. To make the change permanent across system reboots, add it to the file /etc/sysctl.conf
.
Mac OS X
sysctl -w kern.sysv.shmmax=536870912 # bytes; corresponds to 512MB
sysctl -w kern.sysv.shmall=131072 # measured in 4 kB pages
OS X has to be rebooted for the command to take effect. To make the change permanent across system reboots, add it to the file /etc/sysctl.conf
.