Adjust components RAM consumption

Valid from 4.0

The documentation below is valid from Datafari v4.0.0 upwards

First of all : Do not underestimate the importance of SWAP memory ! Be sure your SWAP fits the recommandations according to your physical memory !

You can find recommandations in your operating system documentation and/or website

The stability and performances of Datafari mainly rely on a good RAM management and distribution between its components. Adding more RAM to a Datafari server is completely useless if you don't configure it to exploit the available RAM !

You can check the default RAM configuration of Datafari in the Software requirements

Here are the files location and parameters that allow you to adjust the JVM RAM (excluding SWAP!) consumption by component :

Component	File location	Parameter	Example for 8GB of RAM (not SWAP)
Solr	DATAFARI_HOME/solr/bin/solr.in.sh	SOLR_JAVA_MEM (-Xms and -Xmx)	1GB
ManifoldCF	DATAFARI_HOME/mcf/mcf_home/option.env.unix	-Xms and -Xmx	3.5GB
Tomcat (Main)	DATAFARI_HOME/tomcat/bin/setenv.sh	CATALINA_OPTS (-Xms and -Xmx)	1GB
Tomcat (MCF)	DATAFARI_HOME/tomcat-mcf/bin/setenv.sh	CATALINA_OPTS (-Xms and -Xmx)	1GB
Cassandra	DATAFARI_HOME/cassandra/conf/jvm.options	-Xms and -Xmx	1GB
Elasticsearch	DATAFARI_HOME/elk/elasticsearch/config/jvm.options	-Xms and -Xmx	N/A
Logstash	DATAFARI_HOME/elk/logstash/config/jvm.options	-Xms and -Xmx	N/A
Kibana	DATAFARI_HOME/elk/scripts/set-elk-env.sh	NODE_OPTIONS (--max-old-space-size)	N/A
Tika server (Enterprise Edition)	DATAFARI_HOME/tika-server/bin/set-tika-env.sh	TIKA_SPAWN_MEM (-JXms and -JXmx)	N/A

As a reminder, the "Xms" parameter defines the minimum amount of RAM consumption and the "Xmx" parameter the maximum ! It is highly recommended to have the same value for those two parameters.

The more important thing to know is that there are two main resource consumption sources : the crawl and the search

Crawl
During a crawl phase, the component that will need a lot of RAM is Tika, because it extracts the content of documents and needs, for certain doc types, to fully load them in memory. Depending on your job configuration, Tika may be :
- Used in MCF if your job is configured to use the "TikaOCR" transformation connector
- Used in Solr if your job is configured to use the "DatafariSolr" output connector instead of the "DatafariSolrNoTika" output connector
- Used in its own JVM if your job is configured to use the "TikaServer" transformation connector (only available in the Enterprise edition)
If you used the Simplified MCF UI of Datafari to create your job, it is automatically configured with the "TikaOCR" connector for the Community Edition, and the "TikaServer" connector for the Enterprise Edition.
Knowing this, you will need to allocate more RAM to the component that handle Tika to ensure the stability of your crawls. Tika needs at least 5GB to be stable. So if you use Tika into MCF or into Solr, you will need to add 5GB to the default configuration of those components.
Search
During the search phase a lot of RAM may be used by Solr to improve performances. Solr uses its own JVM allocated RAM but it also relies on the system cache to perform searches, so it is important to NOT allocate all the available physical memory to Solr or any other Datafari component to ensure best performances !
We recommend to allocate between at least 1GB of RAM and a maximum of 12GB of RAM to Solr depending on the available RAM of your Server. For best search performances, try to let a number of un-allocated RAM that matches your Solr index size (size of the DATAFARI_HOME/solr/solr_home directory).