This documentation is for Dovecot v2.x, see wiki1 for v1.x documentation.

Solr Full Text Search Indexing

Solr is a Lucene indexing server. Dovecot communicates to it using HTTP/XML queries.

The steps described in this wiki page are tested for Solr 7.7.0. For other versions, this these steps may need to be adjusted.

Compiling

Dovecot is not compiled with Solr FTS support by default. To enable it, you need to add the --with-solr parameter to your invocation of the configure script. You will also need to have libexpat installed, including development headers (typically from a separate development package). Configuration will fail if --with-solr is enabled while libexpat headers cannot be found. Older versions of Dovecot also required libcurl for Solr support, but recent versions of Dovecot include a custom HTTP client.

Configuration

Solr Installation

First, the Solr server needs to be installed. Most operating systems will have packages for this. The latest version can be downloaded and installed from official website, and here are instructions to install 7.7.0 based on the howto How to Install Apache Solr 7.5 on Debian 9/8:

wget https://www-eu.apache.org/dist/lucene/solr/7.7.0/solr-7.7.0.tgz
tar xzf solr-7.7.0.tgz solr-7.7.0/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-7.7.0.tgz

To use Solr with Dovecot, it needs to configured specifically for use with Dovecot.

sudo -u solr /opt/solr/bin/solr create -c dovecot 

The location of the files for the newly created instance on the filesystem varies between operating systems and installation methods. For example, in Archlinux, the config files are located in /opt/solr/server/solr/dovecot/conf and data files can be found in /opt/solr/server/solr/dovecot/data. When installed from tarball, these directories can be found in /var/solr/data/dovecot/.

Once the instance is created, you can start Solr. The means of starting, stopping and querying the status of the solr service varies between systems. For systemd, these commands are as follows:

sudo systemctl stop solr
sudo systemctl start solr
sudo systemctl status solr

By default, the Solr administation page for the newly created instance is located at https://localhost:8983/solr/#/~cores/dovecot. It can be used to check the status of the Solr instance. Configuration errors are often most conveniently viewed here. Solr also writes log files. For a tarball installation, these can be found at /var/solr/logs/.

Solr Configuration

There are three primary configuration files that need to be changed to accommodate the Dovecot FTS needs: the instance configuration file solrconfig.xml and the schema files schema.xml and managed-schema used by the instance. These files are both located in the conf directory of the Solr instance (e.g., /var/solr/data/dovecot/conf/).

Remove default core configuration files

rm -f /var/solr/data/dovecot/conf/schema.xml
rm -f /var/solr/data/dovecot/conf/managed-schema
rm -f /var/solr/data/dovecot/conf/solrconfig.xml

Install schema.xml and solrconfig.xml

Copy doc/solr-config-7.7.0.xml and doc/solr-schema-7.7.0.xml (Since Dovecot 2.3.6+) to /var/solr/data/dovecot/conf/ as solrconfig.xml and schema.xml. The managed-schema file is generated based on schema.xml.

Dovecot Plugin

On Dovecot's side add:

Into 10-mail.conf (note add existing plugins to string)

mail_plugins = $mail_plugins fts fts_solr

Into 90-plugins.conf

plugin {
  fts = solr
  fts_solr = url=https://solr.example.org:8983/solr/dovecot/
}

Fields listed in fts_solr plugin setting are space separated. They can contain:

Important notes:

Solr commits & optimization

Solr indexes should be optimized once in a while to make searches faster and to remove space used by deleted mails. Dovecot never asks Solr to optimize, so you should do this yourself. Perhaps a cronjob that sends the optimize-command to Solr every n hours.

With v2.2.3+ Dovecot only does soft commits to the Solr index to improve performance. You must run a hard commit once in a while or Solr will keep increasing its transaction log sizes. For example send the commit command to Solr every few minutes.

# Optimize should be run somewhat rarely, e.g. once a day
curl https://<hostname/ip>:<port|default 8983>/solr/dovecot/update?optimize=true
# Commit should be run pretty often, e.g. every minute
curl https://<hostname/ip>:<port|default 8983>/solr/dovecot/update?commit=true

You may not need those if you are using a recent Solr (7+) or SolrCloud. The default configuration of Solr is to auto-commit every once in a while (~15sec) so commit is not necessary. Also, the default TieredMergePolicy in Solr will automatically purge removed documents later, so optimize is not necessary.

Re-index mailbox

If you require to force dovecot to reindex a whole mailbox you can run the command shown, this will only take action when a search is done and will apply to the whole mailbox.

doveadm fts rescan -u <username>

If you want to index a single mailbox/all mailboxes you can run the command shown, this will happen immediately and will block until the action is completed.

doveadm index [-u <user>|-A] [-S <socket_path>] [-q] [-n <max recent>] <mailbox mask>

Sorting by relevancy

Solr/Lucene supports returning a relevancy score for search results. If you want to sort the search results by the score, use Dovecot's non-standard X-SCORE sort key:

1 SORT (X-SCORE) UTF-8 <search parameters>

Indexes

Dovecot creates the following fields:

Lucene does duplicate suppression based on the "id" field, so even if Dovecot sends the same message multiple times to Solr it gets indexed only once. This might happen currently if multiple searches are started at the same time.

You might want to build a cronjob to go through the Lucene indexes once in a while to delete indexed messages (or entire mailboxes) that no longer exist on the filesystem. It shouldn't normally find any such messages though.

Testing

# telnet localhost imap
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS STARTTLS AUTH=PLAIN AUTH=LOGIN] I am ready.
1 login username password
2 select Inbox
3 SEARCH text "test"

Sharding

If you have more users than fit into a single Solr box, you can split users off to different servers. A couple of different ways you could do it are:

You can also use SolrCloud, the clustered version of Solr, that allows you to scale up, and adds failover / high availability to your FTS system. Dovecot-solr works fine with a SolrCloud cluster as long as the solr schema is the right one.

External Tutorials

External sites with tutorials on using Solr under Dovecot

Tips

Some additional things which might help you configuring Solr search:

Plugins/FTS/Solr (last edited 2019-02-28 19:18:10 by MarttiRannanjarvi)