Info |
---|
This feature is only available since the version Valid from 4.0.0of DatafariThe documentation below is valid from Datafari v4.0.0 upwards |
By default, the 'DatafariSolr' output connector, which is pre-configured in MCF by Datafari, sends all the documents to the /update/extract handler of Solr. This handler uses an embed Tika to parse the incoming document before indexing it, even if the parsing has already been done by a Tika connector or a Tika service connector that you may have configured in the crawl job. This may result in an alteration of the content of the document, like for XML, CSV or JSON files and also in resource and treatment time consumption that could be avoided.
...
The handler java classes are :
com.francelabs.datafari.handler.parsed.ParsedContentHandler
com.francelabs.datafari.handler.parsed.ParsedDocumentLoader
com.francelabs.datafari.handler.parsed.ParsedRequestHandler
They are located under the 'datafari-handler' module of the Datafari github project
...