Fix Tika CVE

Fix Tika CVE

For all versions of Datafari before 7.0

This page explains how to patch different versions of Datafari regarding the Tika CVE : CVE-2025-54988 and CVE-2025-66516 and how the other Tika libs present are not considered as vulerabilities into Datafari.
See also : https://gitlab.datafari.com/datafari-community/datafari/-/issues/1110

Process for all versions of Datafari

 

The zip file must be at a location into the server with no spaces in the path

To fix the CVEs, we provide a script to patch your running Datafari.

NB : this script is for CE or EE versions of Datafari especially between Datafari 5.x and 6.x (it works for monoserver and multiservers installations : if you have a Datafari cluster you must do the same process into each server)

The commands are to be entered into your server with root user only :

  • Download the fix or copy the file into it into the folder of your choice:

    wget https://datafari.com/files/fixes/tika/patch_tika_datafari.zip
  • Uncompress it :

    unzip patch_tika_datafari.zip
  • Go to the patch folder :

    cd patch_tika_datafari
  • Run the patch script :

    bash patch_tika_datafari.sh

That’s it, it is done.

  • Remarks :

After patching, you will see that you still have some tika jar libs especially into Solr and into MCF and also into Datafari folder (for 6.2-6.3 versions). These libs do not affect your Datafari. Here are the explanations :

  • Lib into Solr
    The libs regarding Tika are into extraction and langid modules of Solr. These modules are packaged directly with Solr.
    Locations of the libs :

    /opt/datafari/solr/modules/extraction/lib/tika-core-1.28.5.jar /opt/datafari/solr/modules/langid/lib/tika-core-1.28.5.jar
    • The lib into langid module is not affected by the CVE because Tika here is only used for the language detection.
      But into Datafari, we use Google algorithm : LangDetectLanguageIdentifierUpdateProcessorFactory and not TikaLanguageIdentifierUpdateProcessorFactory so it does not affect at all Datafari.
      Locations of the libs :

    • Now regarding the lib into extraction module, it does not affect Datafari because we do not use Solr Cell ie extract update request handler. Indeed we use the standalone Tika server that is packaged with Datafari
      See https://solr.apache.org/security.html#cve-2025-66516-apache-solr-extraction-module-vulnerable-to-xxe-attacks-via-xfa-content-in-pdfs for more information by Solr team

  • Tika core lib into MCF
    The lib into MCF regarding Tika is here :

    /opt/datafari/mcf/mcf_home/connector-common-lib/tika-core-1.28.1.jar

This lib is used only for users who use Tika integrated into MCF to extract text from documents. So it does not affect Datafari because we use the standalone Tika server into the job pipelines in order to extract the text of documents.

  • Optional : only for Datafari versions between 6.2 and 6.3

There are some Tika libs into /opt/datafari/tomcat/webapps/Datafari/WEB-INF/lib folder :

tika-core-2.9.1.jar tika-parser-code-module-2.9.1.jar tika-parser-html-module-2.9.1.jar tika-parser-microsoft-module-2.9.1.jar tika-parser-pdf-module-2.9.1.jar tika-parser-webarchive-module-2.9.1.jar tika-parser-apple-module-2.9.1.jar tika-parser-crypto-module-2.9.1.jar tika-parser-image-module-2.9.1.jar tika-parser-miscoffice-module-2.9.1.jar tika-parser-pkg-module-2.9.1.jar tika-parser-xml-module-2.9.1.jar tika-parser-audiovideo-module-2.9.1.jar tika-parser-digest-commons-2.9.1.jar tika-parser-mail-commons-2.9.1.jar tika-parser-news-module-2.9.1.jar tika-parsers-standard-package-2.9.1.jar tika-parser-xmp-commons-2.9.1.jar tika-parser-cad-module-2.9.1.jar tika-parser-font-module-2.9.1.jar tika-parser-mail-module-2.9.1.jar tika-parser-ocr-module-2.9.1.jar tika-parser-text-module-2.9.1.jar tika-parser-zip-commons-2.9.1.jar

These libs are some dependencies of the langchain4j-easy-rag-0.35.0.jar lib that is also present into that folder. It does not affect Datafari because langchain4j-easy-rag proposes directly text extraction but we do not use it. We use instead our standalone Tika server.
The langchain4j-easy-rag is not present into Datafari starting from 7.0 version.