Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

Info

Valid from Datafari 4.0

Protwords or protected words are a list of words that will be protected from the stemmers.

For a reminder about stemming in Solr we consider the following example : in the case that you send documents to Solr in the field content_en in your main Solr collection :

  1. The

...

  1. man ran

...

  1.  to the beach

  2. The man

...

  1. is running

...

  1.  to the beach.

  2. The man

...

  1. will run

...

  1.  to the beach.

  2. The

...

  1. man runs

...

  1.  to the beach every day.

  2. The man wants to be

...

  1. runner.

With no stemming on the field, if the user searchs the term 'run", only the document 3 will be in the documents list. In the other hand, if an agressive stemmer is in place, most the documents will be on the documents list (different stemmer exist and their choice has a leverage  on the documents list).

in In the document 2, running will be transformed in 'run" prefix, in the document 4, runs will be transformed in 'run prefix'.

If we put the term 'running' on the protected words file, the term will be protected for the stemmers in the content_xx (xx is the language) fields of Datafari. So the term running will not turn into 'run' prefix by the stemmer.

ConcretlyConcretely, it means that the user will have to enter the exact term 'running' to retrieve the document that contains this term .            in a Solr document.                                  

Search Expert: managing protwords

...

Protwords are not language specific. You need to select the language "ALL" in the selection of the language. Once this is done, you get a nice interface allowing you to edit the protwords list. Note that only one search expert at a given time can edit this file. Any other simultaneous tentative will end up with an error message on the screen.


 Here  Here you can delete/add protwords directly by editing the text file. Simply enter one protword per line.

Once you are ok with your modifications, click on the 'Confirm' button. The modifications are immediately taken into account with no further action (file sent to Zookeeper then a reload of the Solr collection is performed).