This documentation is for Dovecot v2.x, see wiki1 for v1.x documentation.
Differences between revisions 21 and 39 (spanning 18 versions)
Revision 21 as of 2013-11-03 20:24:09
Size: 4869
Editor: TimoSirainen
Comment:
Revision 39 as of 2020-08-04 13:29:50
Size: 14466
Editor: 89
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
Line 7: Line 6:
Make sure that user listing is configured for your userdb, this is required by replication: Replication works only between server pairs. If you have a large cluster, you need multiple independently functioning Dovecot backend pairs. This means that [[Director|director]] isn't currently supported with replication. The replication in general is a bit resource intensive, so it's not recommended to be used in multi-million user installations.

'''WARNING''': Shared folder replication doesn't work correctly right now – mainly it can generate a lot of duplicate emails. This is because there's currently a per-user lock that prevents multiple dsyncs from working simultaneously on the same user. But with shared folders multiple users can be syncing the same folder. So this would need additional locks (e.g. shared folders would likely need to lock the owner user, and public folders would likely need a per-folder lock or a maybe a global public folder lock). Fixing this is currently low priority for Dovecot developers.

== Configuration ==
Since v2.3.1 you can disable replication for a user by providing 'noreplicate' [[UserDatabase/ExtraFields|user database field]]. Another way to disable replication for some users is to return mail_replica field from userdb for users you want to replicate.

Make sure that user listing is configured for your userdb, this is required by replication to find the list of users that are periodically replicated:
Line 11: Line 18:
Line 17: Line 23:
Line 25: Line 30:
Line 34: Line 38:
Line 47: Line 50:
Line 56: Line 58:
  }
}
}}}
    user = vmail
  }
}
}}}

You can configure how many dsyncs can be run in parallel (10 by default):

{{{
replication_max_conns = 10
}}}
Normally all replication is asynchronous. You can also optionally configure new mail saving to be synchronous, with a timeout to avoid waiting too long. This can be configured with:

{{{
plugin {
  # When saving a new mail via IMAP or delivering a mail via LDA/LMTP,
  # wait for the mail to be synced to the remote site. If it doesn't finish
  # in 2 seconds, return success anyway.
  replication_sync_timeout = 2s
}
}}}
Line 61: Line 79:
Line 71: Line 88:
Line 73: Line 89:
Line 76: Line 93:
Line 82: Line 98:
Line 91: Line 106:
Line 93: Line 107:
Line 105: Line 118:
The client must be able to verify that the SSL certificate is valid, so you need to specify the directory containing valid SSL CA roots: The doveadm listener will use the SSL certificate that is configured globally for all SSL listeners, i.e. via the following settings at the top level of the configuration file:

{{{
ssl_cert = </etc/ssl/dovecot.pem
ssl_key = </etc/ssl/dovecot.pem
}}}

{{{ssl_cert}}} is not a valid setting inside {{{service}}} or {{{inet_listener}}} blocks, so you can't use a separate SSL certificate for the doveadm listener.
You can, however, use separate SSL certificates for the ''other'' protocols, like so:
{{{
protocol imap {
  ssl_cert = </etc/ssl/certs/imap.pem
  ssl_key = </etc/ssl/private/imap.pem
}
protocol pop3 {
  ssl_cert = </etc/ssl/certs/pop3.pem
  ssl_key = </etc/ssl/private/pop3.pem
}
}}}

When one Dovecot instance connects to the other one in the replication pair, it has to verify that the partner's SSL certificate is valid, so you need to specify a directory or file containing valid SSL CA roots:
Line 111: Line 145:
Line 115: Line 150:
You could point {{{ssl_client_ca_file}}} to your private CA, but keep in mind that {{{ssl_client_ca_file}}} and {{{ssl_client_ca_dir}}} also affect other services where Dovecot acts as an SSL client (e.g. the imapc feature), so be careful not to break SSL for those services.
Line 116: Line 153:
Line 120: Line 156:
Line 126: Line 163:
Line 128: Line 164:
Line 131: Line 168:
Line 133: Line 169:
Line 140: Line 177:
== dsync parameters ==
With v2.2.9+ you can configure what parameters replicator uses for the {{{doveadm sync}}} command:

{{{
replication_dsync_parameters = -d -N -l 30 -U
}}}
The {{{-f}}} and {{{-s}}} parameters are added automatically when needed.

Usually the only change you may want to do is replace {{{-N}}} (= sync all namespaces) with {{{-n <namespace>}}} or maybe just add {{{-x <exclude>}}} parameter(s).

== Administration ==
{{{doveadm replicator status}}} provides a summary. For example:

{{{
Queued 'sync' requests 0
Queued 'high' requests 0
Queued 'low' requests 0
Queued 'failed' requests 0
Queued 'full resync' requests 90
Waiting 'failed' requests 10
Total number of known users 100
}}}
The first 3 fields describe users who have a replication pending with a specific priority. The same user can only be in one (or none) of these queues:

 * Queued 'sync' requests: This priority is used only for mail saves if {{{replication_sync_timeout}}} setting is used.
 * Queued 'high' requests: This priority is used only for mail saves if {{{replication_sync_timeout}}} setting is not used, or if the sync request timed out.
 * Queued 'low' requests: This priority is used for everything else except mail saves.

The following fields are:

 * Queued 'failed' requests: Number of users who have a replication pending and where the last sync attempt failed. These users are retried as soon as higher priority users' replication has finished.
 * Queued 'full resync' requests: Number of users who don't specifically have any replication pending, but who are currently waiting for a periodic "full sync". This is controlled by the {{{replication_full_sync_interval}}} setting.
 * Waiting 'failed' requests: Number of users whose last replication attempt failed, and we're now waiting for the retry interval (5 mins) to pass before another attempt.
 * Total number of known users: Number of users that replicator knows about. The users can be listed with: {{{doveadm replicator status '*'}}}

The per-user replication status can be shown with {{{doveadm replicator status <username pattern>}}}. The username pattern can contain '*' and '?' wildcards. The response contains for example:

{{{
username priority fast sync full sync success sync failed
test100 none 02:03:52 02:08:52 02:03:52 -
test1 none 00:00:01 00:43:33 03:20:46 y
test2 none 02:03:51 02:03:51 02:03:51 -
}}}
These fields mean:

 * priority: none, low, high, sync
 * fast sync: How long time ago the last "fast sync" (non-full sync) attempt was performed. Ideally this is close to the time when the user was last modified. This doesn't mean that the sync succeeded necessarily.
 * full sync: How long time ago the last "full sync" attempt was performed. This should happen once per {{{replication_full_sync_interval}}}. This doesn't mean that the sync succeeded necessarily.
 * success sync: Time when the last successful sync was performed. If the last sync succeeded, this is the same as the "fast sync" or the "full sync" timestap.
 * failed: "y" if the last sync failed, "-" if not.

The current dsync replication status can be looked up with {{{doveadm replicator dsync-status}}}. This shows the dsync replicator status for each potential dsync connection, as configured by {{{replication_max_conns}}}. An example output is:

{{{
username type status
test100 full Waiting for dsync to finish
test1 normal Waiting for handshake
                           - Not connected
                           - Not connected
}}}
Here there are 4 lines, meaning {{{replication_max_conns=4}}}. Only two of the dsync-connections are being used currently.

The fields mean:

 * username: User currently being replicated.
 * type: incremental, normal or full. Most of the replications are "incremental", while full syncs are "full". A "normal" sync is done when incremental syncing state isn't available currently. The "incremental" matches doveadm sync's -s parameter, "full" is -f parameter and "normal" is the default.
 * status: Human-readable status of the connection. These are the current values:
  * Not connected
  * Failed to connect to '%s' - last attempt %ld secs ago
  * Idle
  * Waiting for handshake
  * Waiting for dsync to finish

Failed replication attempts are always automatically retried, so any temporary problems should get fixed automatically. In case of bugs it may be necessary to fix something manually. These should be visible in the error logs. So if a user is marked as failed, try to find any errors logged for the user and see if the same error keeps repeating in the logs. If you want to debug the dsync, you can manually trigger it with: {{{doveadm -D sync -u user@domain -d -N -l 30 -U}}} (the parameters after "sync" should be the same as in {{{replication_dsync_parameters}}} setting).
Line 142: Line 253:
Line 150: Line 260:

== Issues ==

=== Deleted Messages ===
If you are providing a pop3 server, then message deletions do not cause a replication notify to occur, and so the two servers will be unsycnchronised until something else causes that mailbox to be replicated.
This can be worked around by ensuring the the delete operation changes the mailbox in some other way - such as moving the message into another folder.

==== Workaround 1 - Change pop3 deletion to set a flag ====

If the setting
{{{
    pop3_deleted_flag = '$POP3Deleted'
}}}

is added, then rather than the mail being deleted, it is flagged, and this change causes a notification and replication to the other server.

However since the mail is not actually deleted, you additionally need another (cron?) task to periodically remove flagged mail, for example:-

{{{
    doveadm expunge -A MAILBOX inbox KEYWORD '$POP3Deleted'
}}}

==== Workaround 2 - Use Lazy Expunge ====

You can use the lazy_expunge plugin - this has the same effect, and automates the removal of the messages after a period of time. https://doc.dovecot.org/configuration_manual/lazy_expunge_plugin/

For example, you can set the following changed config fragments:-

{{{
    mail_plugins = $mail_plugins notify replication lazy_expunge

    namespace inbox {
        separator = /
        prefix =
        inbox = yes

        # automatic lazy expunge mailbox
        mailbox .EXPUNGED {
            autoexpunge = 3 days
            autoexpunge_max_mails = 1000
        }
    }

    plugin {
        # Move messages to an .EXPUNGED mailbox
        lazy_expunge = .EXPUNGED
    }
}}}

Replication with dsync

Dovecot supports master/master replication using dsync. It's recommended that the same user always gets redirected to the same replica, but no changes get lost even if the same user modifies mails simultaneously on both replicas, some mails just might have to be redownloaded. The replication is done asynchronously, so high latency between the replicas isn't a problem. The replication is done by looking at Dovecot index files (not what exists in filesystem), so no mails get lost due to filesystem corruption or an accidental rm -rf, they will simply be replicated back.

NOTE: v2.2 is highly recommended for this. Earlier versions can't do incremental metadata syncing. This means that the more mails a mailbox has, the slower it is to sync it.

Replication works only between server pairs. If you have a large cluster, you need multiple independently functioning Dovecot backend pairs. This means that director isn't currently supported with replication. The replication in general is a bit resource intensive, so it's not recommended to be used in multi-million user installations.

WARNING: Shared folder replication doesn't work correctly right now – mainly it can generate a lot of duplicate emails. This is because there's currently a per-user lock that prevents multiple dsyncs from working simultaneously on the same user. But with shared folders multiple users can be syncing the same folder. So this would need additional locks (e.g. shared folders would likely need to lock the owner user, and public folders would likely need a per-folder lock or a maybe a global public folder lock). Fixing this is currently low priority for Dovecot developers.

Configuration

Since v2.3.1 you can disable replication for a user by providing 'noreplicate' user database field. Another way to disable replication for some users is to return mail_replica field from userdb for users you want to replicate.

Make sure that user listing is configured for your userdb, this is required by replication to find the list of users that are periodically replicated:

doveadm user '*'

Enable the replication plugin globally (most likely you'll need to do this in 10-mail.conf):

mail_plugins = $mail_plugins notify replication

Replicator process should be started at startup, so it can start replicating users immediately:

service replicator {
  process_min_avail = 1
}

You need to configure how and where to replicate. Using SSH for example:

dsync_remote_cmd = ssh -l%{login} %{host} doveadm dsync-server -u%u
plugin {
  mail_replica = remote:vmail@anotherhost.example.com
}

The mail processes need to have access to the replication-notify fifo and socket. If you have a single vmail UID, you can do:

service aggregator {
  fifo_listener replication-notify-fifo {
    user = vmail
  }
  unix_listener replication-notify {
    user = vmail
  }
}

The replication-notify only notifies the replicator processes that there is work to be done, so it's not terribly insecure either to just set mode=0666.

Enable doveadm replicator commands by setting:

service replicator {
  unix_listener replicator-doveadm {
    mode = 0600
    user = vmail 
  }
}

You can configure how many dsyncs can be run in parallel (10 by default):

replication_max_conns = 10

Normally all replication is asynchronous. You can also optionally configure new mail saving to be synchronous, with a timeout to avoid waiting too long. This can be configured with:

plugin {
  # When saving a new mail via IMAP or delivering a mail via LDA/LMTP,
  # wait for the mail to be synced to the remote site. If it doesn't finish
  # in 2 seconds, return success anyway.
  replication_sync_timeout = 2s
}

dsync over TCP connections (v2.2+)

Create a listener for doveadm-server:

service doveadm {
  inet_listener {
    port = 12345
  }
}

And tell doveadm client to use this port by default:

doveadm_port = 12345

Both the client and the server also need to have a shared secret:

doveadm_password = secret

Now you can use tcp:hostname as the dsync target. You can also override the port with tcp:hostname:port.

plugin {
  mail_replica = tcp:anotherhost.example.com # use doveadm_port
  #mail_replica = tcp:anotherhost.example.com:12345 # use port 12345 explicitly
}

SSL

You can also use SSL for the connection:

service doveadm {
  inet_listener {
    port = 12345
    ssl = yes
  }
}

The doveadm listener will use the SSL certificate that is configured globally for all SSL listeners, i.e. via the following settings at the top level of the configuration file:

ssl_cert = </etc/ssl/dovecot.pem
ssl_key = </etc/ssl/dovecot.pem

ssl_cert is not a valid setting inside service or inet_listener blocks, so you can't use a separate SSL certificate for the doveadm listener. You can, however, use separate SSL certificates for the other protocols, like so:

protocol imap {
  ssl_cert = </etc/ssl/certs/imap.pem
  ssl_key = </etc/ssl/private/imap.pem
}
protocol pop3 {
  ssl_cert = </etc/ssl/certs/pop3.pem
  ssl_key = </etc/ssl/private/pop3.pem
}

When one Dovecot instance connects to the other one in the replication pair, it has to verify that the partner's SSL certificate is valid, so you need to specify a directory or file containing valid SSL CA roots:

ssl_client_ca_dir = /etc/ssl/certs # Debian/Ubuntu
ssl_client_ca_file = /etc/pki/tls/cert.pem # RedHat

Now you can use tcps:hostname or tcps:hostname:port as the dsync target.

Note that the SSL certificate must be signed by one of the CAs in the ssl_client_ca_dir or ssl_client_ca_file. You can't use a self-signed certificate or a private CA, unless you correctly set them up into the CA file/directory (see openssl documentation for details).

You could point ssl_client_ca_file to your private CA, but keep in mind that ssl_client_ca_file and ssl_client_ca_dir also affect other services where Dovecot acts as an SSL client (e.g. the imapc feature), so be careful not to break SSL for those services.

dsync wrapper script for root SSH login (v2.2+)

If you're using multiple UIDs, dsync needs to be started as root, which means you need to log in as root with ssh (or use sudo). Another possibility is to allow root to run only a wrapper script. There is some built-in support for this in v2.2+ to make it easier:

dovecot.conf:

dsync_remote_cmd = /usr/bin/ssh -i /root/.ssh/id_dsa.dsync %{host} /usr/local/bin/dsync-in-wrapper.sh
plugin {
  mail_replica = remoteprefix:vmail@anotherhost.example.com
}

/root/.ssh/authorized_keys:

command="/usr/local/bin/dsync-in-wrapper.sh",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <ssh key>

/usr/local/bin/dsync-in-wrapper.sh:

read username
ulimit -c unlimited # for debugging any crashes
/usr/local/bin/doveadm dsync-server -u $username

dsync parameters

With v2.2.9+ you can configure what parameters replicator uses for the doveadm sync command:

replication_dsync_parameters = -d -N -l 30 -U

The -f and -s parameters are added automatically when needed.

Usually the only change you may want to do is replace -N (= sync all namespaces) with -n <namespace> or maybe just add -x <exclude> parameter(s).

Administration

doveadm replicator status provides a summary. For example:

Queued 'sync' requests        0
Queued 'high' requests        0
Queued 'low' requests         0
Queued 'failed' requests      0
Queued 'full resync' requests 90
Waiting 'failed' requests     10
Total number of known users   100

The first 3 fields describe users who have a replication pending with a specific priority. The same user can only be in one (or none) of these queues:

  • Queued 'sync' requests: This priority is used only for mail saves if replication_sync_timeout setting is used.

  • Queued 'high' requests: This priority is used only for mail saves if replication_sync_timeout setting is not used, or if the sync request timed out.

  • Queued 'low' requests: This priority is used for everything else except mail saves.

The following fields are:

  • Queued 'failed' requests: Number of users who have a replication pending and where the last sync attempt failed. These users are retried as soon as higher priority users' replication has finished.
  • Queued 'full resync' requests: Number of users who don't specifically have any replication pending, but who are currently waiting for a periodic "full sync". This is controlled by the replication_full_sync_interval setting.

  • Waiting 'failed' requests: Number of users whose last replication attempt failed, and we're now waiting for the retry interval (5 mins) to pass before another attempt.
  • Total number of known users: Number of users that replicator knows about. The users can be listed with: doveadm replicator status '*'

The per-user replication status can be shown with doveadm replicator status <username pattern>. The username pattern can contain '*' and '?' wildcards. The response contains for example:

username           priority fast sync  full sync  success sync failed
test100            none     02:03:52   02:08:52   02:03:52     -
test1              none     00:00:01   00:43:33   03:20:46     y
test2              none     02:03:51   02:03:51   02:03:51     -

These fields mean:

  • priority: none, low, high, sync
  • fast sync: How long time ago the last "fast sync" (non-full sync) attempt was performed. Ideally this is close to the time when the user was last modified. This doesn't mean that the sync succeeded necessarily.
  • full sync: How long time ago the last "full sync" attempt was performed. This should happen once per replication_full_sync_interval. This doesn't mean that the sync succeeded necessarily.

  • success sync: Time when the last successful sync was performed. If the last sync succeeded, this is the same as the "fast sync" or the "full sync" timestap.
  • failed: "y" if the last sync failed, "-" if not.

The current dsync replication status can be looked up with doveadm replicator dsync-status. This shows the dsync replicator status for each potential dsync connection, as configured by replication_max_conns. An example output is:

username                   type   status
test100                    full   Waiting for dsync to finish
test1                      normal Waiting for handshake
                           -      Not connected
                           -      Not connected

Here there are 4 lines, meaning replication_max_conns=4. Only two of the dsync-connections are being used currently.

The fields mean:

  • username: User currently being replicated.
  • type: incremental, normal or full. Most of the replications are "incremental", while full syncs are "full". A "normal" sync is done when incremental syncing state isn't available currently. The "incremental" matches doveadm sync's -s parameter, "full" is -f parameter and "normal" is the default.
  • status: Human-readable status of the connection. These are the current values:
    • Not connected
    • Failed to connect to '%s' - last attempt %ld secs ago
    • Idle
    • Waiting for handshake
    • Waiting for dsync to finish

Failed replication attempts are always automatically retried, so any temporary problems should get fixed automatically. In case of bugs it may be necessary to fix something manually. These should be visible in the error logs. So if a user is marked as failed, try to find any errors logged for the user and see if the same error keeps repeating in the logs. If you want to debug the dsync, you can manually trigger it with: doveadm -D sync -u user@domain -d -N -l 30 -U (the parameters after "sync" should be the same as in replication_dsync_parameters setting).

Notes

Random things to remember:

  • The replicas can't share the same quota database, since both will always update it
  • With mdbox format "doveadm purge" won't be replicated
  • "doveadm force-resync", "doveadm quota recalc" and other similar fixing commands don't get replicated
  • The servers must have different hostnames or the locking doesn't work and can cause replication problems.
    • v2.2.6+: If you're having trouble, verify that dovecot --hostdomain returns different values for the servers.

Issues

Deleted Messages

If you are providing a pop3 server, then message deletions do not cause a replication notify to occur, and so the two servers will be unsycnchronised until something else causes that mailbox to be replicated. This can be worked around by ensuring the the delete operation changes the mailbox in some other way - such as moving the message into another folder.

Workaround 1 - Change pop3 deletion to set a flag

If the setting

    pop3_deleted_flag = '$POP3Deleted'

is added, then rather than the mail being deleted, it is flagged, and this change causes a notification and replication to the other server.

However since the mail is not actually deleted, you additionally need another (cron?) task to periodically remove flagged mail, for example:-

    doveadm expunge  -A MAILBOX inbox KEYWORD '$POP3Deleted'

Workaround 2 - Use Lazy Expunge

You can use the lazy_expunge plugin - this has the same effect, and automates the removal of the messages after a period of time. https://doc.dovecot.org/configuration_manual/lazy_expunge_plugin/

For example, you can set the following changed config fragments:-

    mail_plugins = $mail_plugins notify replication lazy_expunge

    namespace inbox {
        separator = /
        prefix =
        inbox = yes

        # automatic lazy expunge mailbox
        mailbox .EXPUNGED {
            autoexpunge = 3 days
            autoexpunge_max_mails = 1000
        }
    }

    plugin {
        # Move messages to an .EXPUNGED mailbox
        lazy_expunge = .EXPUNGED
    }

None: Replication (last edited 2020-10-21 20:42:17 by 108)