Page Comparison

Be it for users or for companies, the amount of data is increasing exponentially. On top of it, the challenge of the cloud multiplies the number and heterogeneity of systems hosting data. Search engines are here to tackle this challenge. They can connect to many systems, propose a single view on the entirety of available data. Contrarily to web search engines, enterprise search engines are 100% controlled by you, and have access to data which belong to you, and which are not necessarily public. These search engines guarantee the security of access to these data.

Among the many existing solutions, a majority is proprietary : you acquire a licence, you pay for support, and you buy the integration. However, a large chunk on the basic functionalities are now available as open source, and don’t require massive investments. The big players of the web have made this choice: Linkedin, eBay, Twitter, Salesforce, Kelkoo… They all use open source tools.

However, the most well knowned tools, Apache Lucene and Apache Solr, are only the heart of a search solution. They do not provide any framework to manage the access to the data sources, they do not handle security, and they do not manage backup or monitoring activities. Other complementary open source projects are available, but the integration is not always easy. This is where Datafari is standing: it integrates these technologies, using as much as possible projects using an Apache licence (or equivalent), in order to remain non aggressive for companies.

We wanted to offer to the community an easy to use tool, affordable – even free – for many use cases, but also able to scale up in order to manage hundreds of millions of indexed documents, thanks to SolrCloud. You will find in this document an overview of Datafari, in order to better understand it, use it, even extend it. Obviously, we encourage the users community to help us in the evolution of this open source tool. In the architecture section, you will discover the architecture of Datafari and its main components. In the crawling section, we will detail the crawling section and its usage. In the indexing section, we present the content indexing part of the search engine. In the search section, we cover the search part of the search engine, which means how do the queries and search algorithm work. In the user interfaces section, we present the default user interface used by Datafari. In the security section, we cover the security challenges. In the analytics section, we will learn how to monitor Datafari. In the use cases section, we present use cases which you can use as a way to kickstart you own projects.

Info

title	User documentation

In this user documentation section, you will learn all you need about Datafari. Whether you are the search manager, a standard or the Datafari system administrator, we cover in this section the functionnalities of Datafari.

Info

title	Developer documentation

In this developer documentation section, you will learn how to set up a proper development environment, as well as the test environment into place. For now, only the development environment is documented.

Versions Compared

Old Version 1

New Version 2

Key