The Search functionality is one of the important aspects when choosing an Open Source CMS, and as you may know, Alfresco uses the Apache Lucene search engine; a high-performance, full-featured text search engine.
Along with the Lucene search engine, Alfresco's search capability is powered by Open Office, which is able to extract text from many file formats and make them available to the Lucene search engine.
Let’s say a user has a PDF file, which contains a few text images, but he wants to store it as a text file in Alfresco repository, and of course the user wants to search the file by providing some keywords or meta-data. Searching files with meta-data may be relatively easy because most CMS supports custom meta-data. However, the problem is that the PDF file consists of images not text, which means without converting it to text there is no way to search the file with content.
As I mentioned in my previous blog ( http://www.appnovation.com/alfresco-transformation ), Alfresco supports various transformations and it can transform the PDF file to a text file by integrating OCR engine. By doing this, a user can find the file by providing various types of search such as wildcard, fuzzy, range, Boolean, and so on.
For example, Alfresco search engine supports single and multiple character wildcard searches within single terms. To search for “text” or “test” you can use the search “te?t” or “te*t”. Another example of Alfresco search is Fussy Searches. The fussy searches are based on the Levenshtein Distance, or Edit Distance algorithm. To search for a tem similar in spelling to “roam” you can use “roam~”. This search will find terms like “foam” and “roams”. In addition, you can specify the similarity by providing a value between o and 1 such as “roam~0.7”.
Finally, another good feature of Alfresco search is that it is possible to search the Alfresco repository directly from the browser. In order to add the Alfresco to Firefox or IE, to search alfresco repository directly from browser:
1. Open any browser and login Alfresco explorer client. Once logged in your browser, “Alfresco keyword search” should be displayed in your search engine dropdown box. (See the screenshot attached below.)

2. Now you can search the Alfresco repository directly from your browser.


Comments
Thank you for the article about Alfresco search feature.
Currently, my company is adopting Alfresco to provide CMS service to Korean companies.
I've heard Lucene engine is quite limited in searching through non-English documents. Would you point me where limitation is discussed about and how I can solve it?
I am not sure if you have actually had any specific issues with searching non-English documents.
In some cases you may need to customize the lucene analyzer depends on your language. Basically as Lucene preprocesses the input stream through the analyzer, it is possible to perform language-specific filtering. For example, if we search “회사” in Korean, the Lucene returns nothing but it may return something when we search “주식회사” because of having such as a space or not.
If you want to discuss more about this, then don’t hesitate and post comments here or send an email to me.
I have the same problem in russian language (
And the non-english search dont work in share interface too.
Try "+" operator such as "+one +market" (don't put double quotation marks).
Let's say,
test1.txt says "I went one day to the market",
test2.txt says "I went one day to the house",
test3.txt says "I went to the market"
and if you enter "+one +market" then Alfresco will return test1.txt only.
Let me know if it works.
It works !!! Thanks Chunho !
I have installed abbyyocr to convert tiff documents to readable pdf with a RuntimeExecutableContentTransformerWorker as explained here: http://wiki.alfresco.com/wiki/Content_Transformations
This works just fine however I now need a way to convert image pdf to readable pdf. I dont think I can use the above method to call abbyyocr as both the source and target extension is pdf. Is there any other way to achieve this?
Many thanks.
Do you know how to do an AND search.
When say if I search for one day, Alfresco will return all documents that have one and day. If I search for "one day", it will search for all documents that have continuous word as "one day".
So if I have a document with text I went one day to the market, I can search "one day" to get that document. What if I want to search a document that has one and market. If I say one market I get all documents that either have one, market or both. I want all documents that has both one and market and not those that just have one or just has market.
Hello
I am new to Alfresco Share Customization.
We are 5 Developers Team which are going to work on Alfresco Share Customization.
Please guide me how to Set Development Environment for working in Team for this customization Project.
Please help.
Thanks!
-Nirvan
Post new comment