Search basics

Valid from Datafari 6.0

The search phase is the most "intuitive" one, in a sense that it leverages user interface paradigms that are already widespread thanks to web search engines such as Google or Bing, and also thanks for ecommerce search engines such as eBay or Amazon. We explain briefly here the different capabilities that Datafari exposes, thanks to the graphical framework Ajaxfrancelabs, and of course thanks to the Apache Solr engine working behind the scene.

The first page you see when you use the search functionnality, is a simple search bar. Start typing text in it, and the autocomplete automatically starts proposing you terms that are present in the search index, ranked by "relevance" (the relevance here being mainly correlated to the terms frequency in the corpus). You can either continue typing, or at any point in time you can click on one of the proposed term. The autocomplete by default is a term by term autocomplete. If you select a term, and type a whitespace and start typing a second term, the autocomplete will work on the second term.

image-20240320-071914.png

Once you've clicked the search icon, the display takes you to the search result page. This page is composed as follows: on the upper part, you get the standard search bar. On the left, you get the default facets to filter and navigate through the results. And on the right, you get the list of results ranked by relevance, related to the query terms. These panels use defaults views, and you can refer to the https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1517813761 if you want to modify the display. The deeper the modifications, the more probable it is that you will need to modify also the configuration of the Solr of Datafari, in which case you should refer to the reference Apache Solr documentation (check Release notes - Community Edition for the current version of Solr in use in Datafari).

  • Facets panel: by default, the following facets have been configured:

    • Modification Date: allows to filter based on time windows compared to the current date. Less than 1 month old documents, Less than 1 year old documents, Less than 5 years old documents.

    • Source: allows to filter based on the source repository configured in Apache ManifoldCF

    • Extension: allows to filter based on document types (pdf, doc, docx, xls, html ...)

    • Language: based on Solr capability to autodetect languages, allows to filter based on the documents language.

  • Results list: by default, each result in this list is composed as follows:

    • Graphic icon symbolising the document type

    • Document file name / Title from the HTML header title in case of web page

    • Text snippet, 3 lines maximum, surrounding the query terms found in the document. These query terms are highlighted in bold fonts.

    • Document path in the source repository (or URL for web pages)

    • Link to see a preview of the document

    • The Source

image-20240320-072152.png

Facets have a standard way of working: if you click on a facet value, it will filter out all the results that don't satisfy the facet value condition. In the illustration below, selecting the pdf value in the Extension facet only displays pdf documents in the result panel.

Datafari also proposes the spellchecker functionnality of Apache Solr. In the illustration below, we enter the query term "eneryg" instead of "energy". The spellchecker automatically proposes a correct word. By default, Datafari searches on this suggestion, but it can be deactivated with slight modifications.

In the search bar, you can type in several query terms, and you can put operators to fine tune your search, such as AND and OR. Check the Apache Solr reference documentation for the full list of operators.


Valid from Datafari 1 up to 5.5

The search phase is the most "intuitive" one, in a sense that it leverages user interface paradigms that are already widespread thanks to web search engines such as Google or Bing, and also thanks for ecommerce search engines such as eBay or Amazon. We explain briefly here the different capabilities that Datafari exposes, thanks to the graphical framework Ajaxfrancelabs, and of course thanks to the Apache Solr engine working behind the scene.

The first page you see when you use the search functionnality, is a simple search bar. Start typing text in it, and the autocomplete automatically starts proposing you terms that are present in the search index, ranked by "relevance" (the relevance here being mainly correlated to the terms frequency in the corpus). You can either continue typing, or at any point in time you can click on one of the proposed term. The autocomplete by default is a term by term autocomplete. If you select a term, and type a whitespace and start typing a second term, the autocomplete will work on the second term.

Once you've clicked the search icon, the display takes you to the search result page. This page is composed as follows: on the upper part, you get the standard search bar. On the left, you get the default facets to filter and navigate through the results. And on the right, you get the list of results ranked by relevance, related to the query terms. These panels use defaults views, and you can refer to the Ajaxfrancelabs framework documentation if you want to modify the display. The deeper the modifications, the more probable it is that you will need to modify also the configuration of the Solr of Datafari, in which case you should refer to the reference Apache Solr documentation (check Release notes - Community Edition for the current version of Solr in use in Datafari).

  • Facets panel: by default, the following facets have been configured:

    • Last modifications: allows to filter based on time windows compared to the current date. Less than 1 month old documents, Less than 1 year old documents, Less than 5 years old documents.

    • Type: allows to filter based on document types (pdf, doc, docx, xls, html ...)

    • Source: allows to filter based on the source repository configured in Apache ManifoldCF

    • Language: based on Solr capability to autodetect languages, allows to filter based on the documents language.

    • Original file size allows you to filter based on the size of the indexed files.

  • Results list: by default, each result in this list is composed as follows:

    • Graphic icon symbolising the document type

    • Document file name in bold font / Title from the HTML header title in case of web page

    • Text snippet, 3 lines maximum, surrounding the query terms found in the document. These query terms are highlighted in bold fonts.

    • Document path in the source repository (or URL for web pages)

Facets have a standard way of working: if you click on a facet value, it will filter out all the results that don't satisfy the facet value condition. In the illustration below, selecting the pdf value in the Type facet only displays pdf documents in the result panel.

Datafari also proposes the spellchecker functionnality of Apache Solr. In the illustration below, we enter the query term "eneryg" instead of "energy". The spellchecker automatically proposes a correct word. By default, Datafari searches on this suggestion, but it can be deactivated with slight modifications.

In the search bar, you can type in several query terms, and you can put operators to fine tune your search, such as AND and OR. Check the Apache Solr reference documentation for the full list of operators.