Backup

Valid from Datafari X.X

The most important thing to backup is the Solr index. Indeed, it contains the data crawled on the different systems and that are then stored in the Solr index.
Losing the index entirely means that one needs to perform again an initial indexation from scratch. This requires a lot of time and resources and must be avoided as much as possible.
Therefore the Solr index must be backed up periodically in order  to have backup milestones from which we can restore in case Datafari has an issue or the server crash.

About the databases, Cassandra stores all the users preferences such as the alerts, the saved searches, or the favorite documents located in the Datafari web app server.
The criticality of this component is less important than the Solr index.

Finally, PostgreSQL and Zookeeper MCF are related to the indexation server i.e MCF. MCF stores into its database the status of the crawled documents, the created jobs etc… and Zookeeper is used to coordinate cluster processes and store global configuration.
We also do backups of the MCF configuration which is exported in JSON files (in the Enterprise Edition).

Â