Introduction

Be it for users or for companies, the amount of data is increasing exponentially. On top of it, the challenge of the cloud multiplies the number and heterogeneity of systems hosting data. Insight Engines (and before them, Enterprise Search) are here to tackle this challenge. They can connect to many systems, propose a single view on the entirety of available data. Contrarily to web search engines, enterprise insight engines are 100% controlled by you, and have access to data which belong to you, and which are not necessarily public. These search engines guarantee the security of access to these data.

Among the many existing solutions, a majority is proprietary : you acquire a licence, you pay for support, and you buy the integration. However, a large chunk on the basic functionalities are now available as open source, and don’t require massive investments. The big players of the web have made this choice: Linkedin, eBay, Twitter, Salesforce, Bloomberg, Amazon… They all use open source tools.

However, the most well knowned tools, Apache Lucene and Apache Solr, are only the heart of a search solution. They do not provide any framework to manage the access to the data sources, they do not handle security, and they do not manage backup or monitoring activities. Other complementary open source projects are available, but the integration is not always easy. This is where Datafari is standing: it integrates these technologies, using as much as possible projects using an Apache licence (or equivalent), in order to remain non aggressive for companies.

We wanted to offer to the community an easy to use tool, affordable – even free – for many use cases, but also able to scale up in order to manage hundreds of millions of indexed documents, thanks to SolrCloud. You will find in this document an overview of Datafari, in order to better understand it, use it, even extend it. Obviously, we encourage the users community to help us in the evolution of this open source tool. Datafari comes in two flavors: the Community Edition, which is fully open source in Apache v2 licence; and the Enterprise Edition, which is proprietary and comes with more functionnalities and an enterprise grade support.