Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Valid from 65.03

The documentation below is valid from Datafari v6v5.0 3 upwards

Warning

First of all : Do not underestimate the importance of SWAP memory !  Be sure your SWAP fits the recommandations according to your physical memory ! 

We recommend a minimum of

  • 1,5 time the amount of RAM as SWAP memory for servers that have less than 32 GB of RAM

  • 1 time the amount of RAM as SWAP memory for servers that have 32 GB of RAM and more

Note that if you server is properly sized in terms of RAM, you should almost never see any usage of SWAP. This recommended minimum amount is a safety measure in case you end up in scenarii where you push your Datafari to the limit of your physical hardware.

The stability and performances of Datafari mainly rely on a good RAM management and its proper distribution between its components. Adding more RAM to a Datafari server is completely useless if you do not configure it to exploit the available RAM ! 

Info

IF YOU WANT TO SEE THE DEFAULT VALUES: You can check the default JVM RAM configuration of Datafari CE in monoserver_community_memory.properties , and in the equivalent file of the Datafari EE version main_enterprise_memory.properties

CASE 1: Configuration priori to the first start of your Datafari

...

If your Datafari is already initialized, here are the files location and the parameters to adjust the JVM RAM (excluding SWAP!) consumption by component :
(You need to restart Datafari for the changes to be applied)

Component

File location

Parameter

Solr

DATAFARI_HOME/solr/bin/solr.in.sh

SOLR_JAVA_MEM (-Xms and -Xmx)

ManifoldCF

DATAFARI_HOME/mcf/mcf_home/option.env.unix

-Xms and -Xmx

Tomcat (Main)

DATAFARI_HOME/tomcat/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

Tomcat (MCF)

DATAFARI_HOME/tomcat-mcf/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

Cassandra

DATAFARI_HOME/cassandra/conf/jvm-server.options

-Xms and -Xmx

PostgreSQL

DATAFARI_HOME/pgsql/data/postgresql.conf

shared_buffers

Elasticsearch

Apache Zeppelin

DATAFARI_HOME/

elk

analytic-stack/

elasticsearch

zeppelin/

config

bin/

jvm

common.

options

sh

ZEPPELIN_MEM (-Xms and -Xmx

Logstash

DATAFARI_HOME/elk/logstash/config/jvm.options

)

ZEPPELIN_INTP_MEM (-Xms and -Xmx)

Kibana

Logstash

DATAFARI_HOME/analytic-stack/

elk

logstash/

scripts/set-elk-env.shNODE_OPTIONS (--max-old-space-size)

config/jvm.options

-Xms and -Xmx

Tika server

DATAFARI_HOME/tika-server/conf/tika-config.xml

In the <forkedJvmArgs> section:
<arg>-Xms5g</arg>

Info

As a reminder, the "Xms" parameter defines the minimum amount of RAM consumption and the "Xmx" parameter the maximum ! It is highly recommended to have the same value for those two parameters.

The more important thing to know is that there are two main resource consumption sources : the crawl and the search

  1. Crawl

    During a crawl phase, the component that will need a lot of RAM is Tika, because it extracts the content of documents and needs, for certain doc types, to fully load them in memory. Depending on your job configuration, Tika may be :

    • Used in MCF if your job is configured to use the "TikaServerRmetaConnector"  transformation connector 

    • Used in Solr if your job is configured to use the "DatafariSolr" output connector instead of the "DatafariSolrNoTika" output connector

    • Used in its own JVM if your job is configured to use the "TikaServer" transformation connector (only available in the Enterprise edition) 

    If you used the Simplified MCF UI of Datafari to create your job, it is automatically configured with the "TikaOCR" connector for the Community Edition, and the "TikaServer" connector for the Enterprise Edition.

    Knowing this, you will need to allocate more RAM to the component that handle Tika to ensure the stability of your crawls. Tika needs at least 5GB to be stable. So if you use Tika into MCF or into Solr, you will need to add 5GB to the default configuration of those components.

  2. Search

    During the search phase a lot of RAM may be used by Solr to improve performances. Solr uses its own JVM allocated RAM but it also relies on the system cache to perform searches, so it is important to NOT allocate all the available physical memory to Solr or any other Datafari component to ensure best performances ! 

    We recommend to allocate between at least 1GB of RAM and a maximum of 12GB of RAM to Solr depending on the available RAM of your Server. For best search performances, try to let a number of un-allocated RAM that matches your Solr index size (size of the DATAFARI_HOME/solr/solr_home directory). 

Note about Solr memory : By default, its value is set to 1 GB. If you encounter OOMs and you have enough spare ram, try increasing it (keep in mind that your system needs enough SWAP space as well)

...

Expand
titleValid from 5.1 to 5.2
Info

Valid from 5.1 to 5.2

The documentation below is valid from Datafari v5.1 upwards

Warning

First of all : Do not underestimate the importance of SWAP memory !  Be sure your SWAP fits the recommandations according to your physical memory ! 

We recommend a minimum of
- 1,5 time the amount of RAM as SWAP memory for servers that have less than 32 GB of RAM
- 1 time the amount of RAM as SWAP memory for servers that have 32 GB of RAM and more

Note that if you server is properly sized in terms of RAM, you should almost never see any usage of SWAP. This recommended minimum amount is a safety measure in case you end up in scenarii where you push your Datafari to the limit of your physical hardware.

The stability and performances of Datafari mainly rely on a good RAM management and its proper distribution between its components. Adding more RAM to a Datafari server is completely useless if you do not configure it to exploit the available RAM ! 

You can check the default JVM RAM configuration of Datafari CE in monoserver_community_memory.properties

CASE 1: Configuration priori to the first start of your Datafari

Before the first start of Datafari, you can modify in one place all of the values of RAM that you want to apply for the different components.

This file is located in : $DATAFARI_HOME/bin/deployUtils/

You need to edit the file corresponding to your installation case, for example if you are using the Community Edition of Datafari, edit the file monoserver_community_memory.properties.

There is a line of each component, for example if you want to change the amount of RAM for Solr edit the property :

Code Block
SOLRMEMORY=1g
Note

Your modifications will be taken account only if Datafari has never been initialized before.

CASE 2: Configuration in case you had already started your Datafari at least once before

If your Datafari is already initialized, here are the files location and the parameters to adjust the JVM RAM (excluding SWAP!) consumption by component :
(You need to restart Datafari for the changes to be applied)

Component

File location

Parameter

Solr

DATAFARI_HOME/solr/bin/solr.in.sh

SOLR_JAVA_MEM (-Xms and -Xmx)

ManifoldCF

DATAFARI_HOME/mcf/mcf_home/option.env.unix

-Xms and -Xmx

Tomcat (Main)

DATAFARI_HOME/tomcat/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

Tomcat (MCF)

DATAFARI_HOME/tomcat-mcf/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

Cassandra

DATAFARI_HOME/cassandra/conf/jvm-server.options

-Xms and -Xmx

PostgreSQL

DATAFARI_HOME/pgsql/data/postgresql.conf

shared_buffers

Elasticsearch

DATAFARI_HOME/elk/elasticsearch/config/jvm.options

-Xms and -Xmx

Logstash

DATAFARI_HOME/elk/logstash/config/jvm.options

-Xms and -Xmx

Kibana

DATAFARI_HOME/elk/scripts/set-elk-env.sh

NODE_OPTIONS (--max-old-space-size)

Tika server

DATAFARI_HOME/tika-server/bin/set-tika-env.sh

TIKA_SPAWN_MEM (-JXms and -JXmx)

Info

As a reminder, the "Xms" parameter defines the minimum amount of RAM consumption and the "Xmx" parameter the maximum ! It is highly recommended to have the same value for those two parameters.

The more important thing to know is that there are two main resource consumption sources : the crawl and the search

  1. Crawl

    During a crawl phase, the component that will need a lot of RAM is Tika, because it extracts the content of documents and needs, for certain doc types, to fully load them in memory. Depending on your job configuration, Tika may be :

    • Used in MCF if your job is configured to use the "TikaServerRmetaConnector"  transformation connector 

    • Used in Solr if your job is configured to use the "DatafariSolr" output connector instead of the "DatafariSolrNoTika" output connector

    • Used in its own JVM if your job is configured to use the "TikaServer" transformation connector (only available in the Enterprise edition) 

    If you used the Simplified MCF UI of Datafari to create your job, it is automatically configured with the "TikaOCR" connector for the Community Edition, and the "TikaServer" connector for the Enterprise Edition.

    Knowing this, you will need to allocate more RAM to the component that handle Tika to ensure the stability of your crawls. Tika needs at least 5GB to be stable. So if you use Tika into MCF or into Solr, you will need to add 5GB to the default configuration of those components.

  2. Search

    During the search phase a lot of RAM may be used by Solr to improve performances. Solr uses its own JVM allocated RAM but it also relies on the system cache to perform searches, so it is important to NOT allocate all the available physical memory to Solr or any other Datafari component to ensure best performances ! 

    We recommend to allocate between at least 1GB of RAM and a maximum of 12GB of RAM to Solr depending on the available RAM of your Server. For best search performances, try to let a number of un-allocated RAM that matches your Solr index size (size of the DATAFARI_HOME/solr/solr_home directory). 

Note about Solr memory : By default, its value is set to 1 GB. If you encounter OOMs and you have enough spare ram, try increasing it (keep in mind that your system needs enough SWAP space as well)

...

Expand
titleValid for 5.0
Info

Valid

from

for 5.0

The documentation below is valid from Datafari v5.0 upwards

Warning

First of all : Do not underestimate the importance of SWAP memory !  Be sure your SWAP fits the recommandations according to your physical memory ! 

You can find recommandations in your operating system documentation and/or website

The stability and performances of Datafari mainly rely on a good RAM management and distribution between its components. Adding more RAM to a Datafari server is completely useless if you don't configure it to exploit the available RAM ! 

You can check the default RAM configuration of Datafari in the Software requirements

Here are the files location and parameters that allow you to adjust the JVM RAM (excluding SWAP!) consumption by component :

Component

File location

Parameter

Default values

Solr

DATAFARI_HOME/solr/bin/solr.in.sh

SOLR_JAVA_MEM (-Xms and -Xmx)

1GB

ManifoldCF

DATAFARI_HOME/mcf/mcf_home/option.env.unix

-Xms and -Xmx

3.5GB

Tomcat (Main)

DATAFARI_HOME/tomcat/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

1GB

Tomcat (MCF)

DATAFARI_HOME/tomcat-mcf/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

1GB

Cassandra

DATAFARI_HOME/cassandra/conf/jvm-server.options

-Xms and -Xmx

1GB

PostgreSQL

DATAFARI_HOME/pgsql/data/postgresql.conf

shared_buffers

1GB

Elasticsearch

DATAFARI_HOME/elk/elasticsearch/config/jvm.options

-Xms and -Xmx

1GB

Logstash

DATAFARI_HOME/elk/logstash/config/jvm.options

-Xms and -Xmx

1GB

Kibana

DATAFARI_HOME/elk/scripts/set-elk-env.sh

NODE_OPTIONS (--max-old-space-size)

1.4GB (maximum size)

Tika server (Enterprise Edition)

DATAFARI_HOME/tika-server/bin/set-tika-env.sh

TIKA_SPAWN_MEM (-JXms and -JXmx)

5.6GB

Info

As a reminder, the "Xms" parameter defines the minimum amount of RAM consumption and the "Xmx" parameter the maximum ! It is highly recommended to have the same value for those two parameters.

The more important thing to know is that there are two main resource consumption sources : the crawl and the search

  1. Crawl

    During a crawl phase, the component that will need a lot of RAM is Tika, because it extracts the content of documents and needs, for certain doc types, to fully load them in memory. Depending on your job configuration, Tika may be :

    • Used in MCF if your job is configured to use the "TikaOCR"  transformation connector 

    • Used in Solr if your job is configured to use the "DatafariSolr" output connector instead of the "DatafariSolrNoTika" output connector

    • Used in its own JVM if your job is configured to use the "TikaServer" transformation connector (only available in the Enterprise edition) 

    If you used the Simplified MCF UI of Datafari to create your job, it is automatically configured with the "TikaOCR" connector for the Community Edition, and the "TikaServer" connector for the Enterprise Edition.

    Knowing this, you will need to allocate more RAM to the component that handle Tika to ensure the stability of your crawls. Tika needs at least 5GB to be stable. So if you use Tika into MCF or into Solr, you will need to add 5GB to the default configuration of those components.

  2. Search

    During the search phase a lot of RAM may be used by Solr to improve performances. Solr uses its own JVM allocated RAM but it also relies on the system cache to perform searches, so it is important to NOT allocate all the available physical memory to Solr or any other Datafari component to ensure best performances ! 

    We recommend to allocate between at least 1GB of RAM and a maximum of 12GB of RAM to Solr depending on the available RAM of your Server. For best search performances, try to let a number of un-allocated RAM that matches your Solr index size (size of the DATAFARI_HOME/solr/solr_home directory). 

Note about Solr memory : By default, its value is set to 1 GB. If you encounter OOMs and you have enough spare ram, try increasing it (keep in mind that your system needs enough SWAP space as well)

...

Expand
titleValid from 4.0 before 5.0
Info

Valid from 4.0 before 5.0

The documentation below is valid from Datafari v4.0.0 upwards

Warning

First of all : Do not underestimate the importance of SWAP memory !  Be sure your SWAP fits the recommandations according to your physical memory ! 

You can find recommandations in your operating system documentation and/or website

The stability and performances of Datafari mainly rely on a good RAM management and distribution between its components. Adding more RAM to a Datafari server is completely useless if you don't configure it to exploit the available RAM ! 

You can check the default RAM configuration of Datafari in the Software requirements

Here are the files location and parameters that allow you to adjust the JVM RAM (excluding SWAP!) consumption by component :

Component

File location

Parameter

Example for 8GB of RAM (not SWAP)

Solr

DATAFARI_HOME/solr/bin/solr.in.sh

SOLR_JAVA_MEM (-Xms and -Xmx)

1GB

ManifoldCF

DATAFARI_HOME/mcf/mcf_home/option.env.unix

-Xms and -Xmx

3.5GB

Tomcat (Main)

DATAFARI_HOME/tomcat/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

1GB

Tomcat (MCF)

DATAFARI_HOME/tomcat-mcf/bin/setenv.sh

CATALINA_OPTS (-Xms and -Xmx)

1GB

Cassandra

DATAFARI_HOME/cassandra/conf/jvm.options

-Xms and -Xmx

1GB

Elasticsearch

DATAFARI_HOME/elk/elasticsearch/config/jvm.options

-Xms and -Xmx

N/A

Logstash

DATAFARI_HOME/elk/logstash/config/jvm.options

-Xms and -Xmx

N/A

Kibana

DATAFARI_HOME/elk/scripts/set-elk-env.sh

NODE_OPTIONS (--max-old-space-size)

N/A

Tika server (Enterprise Edition)

DATAFARI_HOME/tika-server/bin/set-tika-env.sh

TIKA_SPAWN_MEM (-JXms and -JXmx)

N/A

Info

As a reminder, the "Xms" parameter defines the minimum amount of RAM consumption and the "Xmx" parameter the maximum ! It is highly recommended to have the same value for those two parameters.

The more important thing to know is that there are two main resource consumption sources : the crawl and the search

  1. Crawl
    During a crawl phase, the component that will need a lot of RAM is Tika, because it extracts the content of documents and needs, for certain doc types, to fully load them in memory. Depending on your job configuration, Tika may be :

    • Used in MCF if your job is configured to use the "TikaOCR"  transformation connector 

    • Used in Solr if your job is configured to use the "DatafariSolr" output connector instead of the "DatafariSolrNoTika" output connector

    • Used in its own JVM if your job is configured to use the "TikaServer" transformation connector (only available in the Enterprise edition) 

    If you used the Simplified MCF UI of Datafari to create your job, it is automatically configured with the "TikaOCR" connector for the Community Edition, and the "TikaServer" connector for the Enterprise Edition.

    Knowing this, you will need to allocate more RAM to the component that handle Tika to ensure the stability of your crawls. Tika needs at least 5GB to be stable. So if you use Tika into MCF or into Solr, you will need to add 5GB to the default configuration of those components.

  2. Search
    During the search phase a lot of RAM may be used by Solr to improve performances. Solr uses its own JVM allocated RAM but it also relies on the system cache to perform searches, so it is important to NOT allocate all the available physical memory to Solr or any other Datafari component to ensure best performances ! 

    We recommend to allocate between at least 1GB of RAM and a maximum of 12GB of RAM to Solr depending on the available RAM of your Server. For best search performances, try to let a number of un-allocated RAM that matches your Solr index size (size of the DATAFARI_HOME/solr/solr_home directory). 

Default RAM standard requirements (SWAP NOT INCLUDED):

  • Monoserver (Community edition, without OCR) for a machine with 8GB of RAM :

Tomcat : 1 GB
Solr : 1 GB
ManifoldCF : 3.5GB
Cassandra : 1 GB
PostgreSQL : 1 GB

  • Monoserver (Enterprise edition)

Tomcat : 1 GB
Solr : 1 GB
ManifoldCF : 256MB
Cassandra : 1 GB
PostgreSQL : 1 GB
ELK : 2 GB Elastic  
Tika-server : 5 Go