Getting started with Alfresco & Solr

April 19
blog author

Appno Blogger

Appnovation Coop

Since Alfresco version 4, Solr is the default index and search engine. Lucene is still packaged, though, and it’s easy to switch back to Lucene by configuration. Why Alfresco uses Solr ? Using Solr instead of Lucene gives you more flexibility in a complex architecture. Indeed, if you have 2 different Alfresco instances, by using Lucene, as it’s packaged inside the Alfresco webapp, indexes will be handled and duplicated for each instance, and there is no way to change that. By using Solr, as it’s deployed as an independent webapp, both Alfresco instances can refer to 1 Solr instance, or you can have 1 Solr instance for each Solr instance. In addition, you can choose to deploy your Alfresco and Solr instances either on the same application server or different ones, depending on your architecture and performance target. In one word, you are able to scale your index engine independently from your repository. Alfresco Solr implementation Alfresco doesn’t directly use Solr. It uses a custom webapp based on an old version of Solr (1.4). It’s not trivial to switch to a newer Solr version. Alfresco Solr index structure For each node indexed in the Alfresco repository, 2 nodes are created on the Solr side : - ‘LEAF’ node : contains main information, like type, aspects and IDs - ‘AUX’ node : contains extra information, like parents, path, owner, read ACLs and IDs Sometimes, it can be useful to have a look at what index files contain, to understand what has been indexed.This can be done by using Luke Bi-directional communication The index engine and the repository query each other. Search request : Alfresco -> Solr Alfresco calls Solr whenever it wants to perform a search request like a full text or advanced search run by a user. Async polling for changes : Solr -> Alfresco Solr calls Alfresco periodically to get the latest changes that happen in the repository since the last call. Cores By default, Solr has 2 different cores in order to index the 2 following Alfresco stores : - workspace://SpacesStore - archive://SpacesStore You might want to set additional cores, if you want to index other repository store contents. For example, if you want to include old versions of your content in your search, you can add the following core : - workspace://version2Store