We released in July 2022 version 5.2 of Datafari. Loads of changes, and we are happy to share it with you. Available for Linux, enjoy !! We released a Docker version as well. If you are a windows addict, contact us to see what we can do for you!
Welcome to the Datafari documentation!
This is the home page for the Datafari documentation space within Confluence. This serves as a central repository for anything that relates to Datafari, be it for users or for developers.
Be it for users or for companies, the amount of data is increasing exponentially. On top of it, the challenge of the cloud multiplies the number and heterogeneity of systems hosting data. Insight Engines (and before them, Enterprise Search) are here to tackle this challenge. They can connect to many systems, propose a single view on the entirety of available data. Contrarily to web search engines, enterprise insight engines are 100% controlled by you, and have access to data which belong to you, and which are not necessarily public. These search engines guarantee the security of access to these data.
Among the many existing solutions, a majority is proprietary : you acquire a licence, you pay for support, and you buy the integration. However, a large chunk on the basic functionalities are now available as open source, and don’t require massive investments. The big players of the web have made this choice: Linkedin, eBay, Twitter, Salesforce, Bloomberg, Amazon… They all use open source tools.
However, the most well knowned tools, Apache Lucene and Apache Solr, are only the heart of a search solution. They do not provide any framework to manage the access to the data sources, they do not handle security, and they do not manage backup or monitoring activities. Other complementary open source projects are available, but the integration is not always easy. This is where Datafari is standing: it integrates these technologies, using as much as possible projects using an Apache licence (or equivalent), in order to remain non aggressive for companies.
We wanted to offer to the community an easy to use tool, affordable – even free – for many use cases, but also able to scale up in order to manage hundreds of millions of indexed documents, thanks to SolrCloud. You will find in this document an overview of Datafari, in order to better understand it, use it, even extend it. Obviously, we encourage the users community to help us in the evolution of this open source tool. Datafari comes in two flavors: the Community Edition, which is fully open source in Apache v2 licence; and the Enterprise Edition, which is proprietary and comes with more functionnalities and an enterprise grade support.
Datafari is the ideal product for those who want to search through their data, while using advanced open source technologies. Datafari combines Apache ManifoldCF and Apache Solr and Open Distro for Elasticsearch, to allow its users to analyse and search through many different and diverse data sources: their file shares, their cloud shares (dropbox, google drive…), their databases, but also their emails and many more. Available as community or enterprise version, Datafari has its specificities:
Its open source licence is distributed with the Apache v2 licence: you are legally free to do whatever you want with it, you just need to mention that you’re using it.
It combines three popular Apache projects: Solr for the indexing and searching, Apache ManifoldCF for the crawling, and Cassandra for the user management. The usage of these products guarantees stability over time.
It provides analytics capabilities, through the integration of Open Distro for Elasticsearch.