...
Now your crawling job is ready to use Apache Tika directly in ManifoldCF to extract the content of the crawled files. You can also use Tesseract OCR to analyse images in order to extract data from image and pdf files. Tesseract OCR is bundled in Datafari. In order to make the ManifoldCF TikaOCR transformation connector able to use Tesseract to do the OCR analysis, you have to change the property "OCR" in /opt/datafari/tomcat/conf/datafari.properties from "false" to "true". Then restart Datafari.
...