Protwords Configuration

Valid from Datafari 4.0

Protwords or protected words are a list of words that will be protected from the stemmers.

For a reminder about stemming in Solr we consider the following example : in the case that you send documents to Solr in the field content_en in your main Solr collection :

  1. The man ran to the beach

  2. The man is running to the beach.

  3. The man will run to the beach.

  4. The man runs to the beach every day.

  5. The man wants to be a runner.

With no stemming on the field, if the user searchs the term 'run", only the document 3 will be in the documents list. In the other hand, if an agressive stemmer is in place, most the documents will be on the documents list (different stemmer exist and their choice has a leverage  on the documents list).

In the document 2, running will be transformed in 'run" prefix, in the document 4, runs will be transformed in 'run prefix'.

If we put the term 'running' on the protected words file, the term will be protected for the stemmers in the content_xx (xx is the language) fields of Datafari. So the term running will not turn into 'run' prefix by the stemmer.

Concretely, it means that the user will have to enter the exact term 'running' to retrieve the document that contains this term in a Solr document.                                  

Search Expert: managing protwords

In order to create Protwords, you need to be connected with the search expert role.

Once in the administration interface, go to the administration menu, click on Search Engine Configuration and select Protwords in the dropdown list.

 

Protwords are not language specific. You need to select the language "ALL" in the selection of the language. Once this is done, you get a nice interface allowing you to edit the protwords list. Note that only one search expert at a given time can edit this file. Any other simultaneous tentative will end up with an error message on the screen.


 Here you can delete/add protwords directly by editing the text file. Simply enter one protword per line.

Once you are ok with your modifications, click on the 'Confirm' button. The modifications are immediately taken into account with no further action (file sent to Zookeeper then a reload of the Solr collection is performed).