The Search functionality is one of the important aspects when choosing an Open Source CMS, and as you may know, Alfresco uses the Apache Lucene search engine; a high-performance, full-featured text search engine.
Along with the Lucene search engine, Alfresco's search capability is powered by Open Office, which is able to extract text from many file formats and make them available to the Lucene search engine.
Let’s say a user has a PDF file, which contains a few text images, but he wants to store it as a text file in Alfresco repository, and of course the user wants to search the file by providing some keywords or meta-data. Searching files with meta-data may be relatively easy because most CMS supports custom meta-data. However, the problem is that the PDF file consists of images not text, which means without converting it to text there is no way to search the file with content.