...
Info | ||
---|---|---|
| ||
Except for particular cases, do not use this documentation with your Datafari Enterprise Edition solution, because it is already equipped and preconfigured with an optimised externel Tika Server Connector. |
For users of the Datafari Community Edition: If you want to use the Tika embedded in ManifoldCF, technically-wise, Datafari uses the Apache Solr Extracting Handler (aka Solr Cell) that leverages Apache Tika to extract the content that will be indexed from the crawled files. To limit the resource consumption (especially the network if ManifoldCF is installed on an external server), it is possible to use Tika directly in ManifoldCF. In this case, the content is extracted directly in ManifoldCF, and only the content that should be indexed is sent to Apache Solr. A Tika Transformation Connection is configured by default in Datafari. To use it, you simply have to add it to your crawling job in ManifoldCF :
...