Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

For FileShare collection

Info

This documentation is only valid for update processors for FileShare collection. It will cover normally almost all needs. But If you has a specific need and you want to change the configuration for another collection present into Datafari (Logs, Statistics, Access etc…) see the section at the end of the page.

One of the most interesting feature of Solr is that you can develop custom Update Processors. These components are used (and useful) to perform data adjustments/modifications on documents just before they are indexed.

...

To make it easy for you and for the example, we have prepared a very simple Update Processor github Gitlab project that you can use as template or inspiration to develop your own.

...

In the example project, The “ReplaceUrlUpdateProcessorFactory” simply retrieves the parameters specified in the configuration (if any) and pass passes them on to the “ReplaceUrlUpdateProcessor” constructor. The “ReplaceUrlUpdateProcessor” searches for a field named as specified by the ‘source.field’ parameter (if any), if it exists, its value is extracted and will override the value of the field ‘url’. This algorithm is performed for each document about to be indexed.

Now let us see how to declare and use a custom update processor:

Each Solr core manage manages its own java libraries. Therefore, so, in order to use an Update Processor in a specific core, you will need first to add the Custom Update Processor’s Processor classes to the classpath of the target core. By compiling the example project, you will obtain a jar file named “CustomUpdateProcessor-0.0.1-SNAPSHOT.jar”. Hopefully for you, Datafari is designed to facilitate the implementation of custom update processors in its main core ‘FileShare’. So you will only need to put the jar into DATAFARI_HOME/solr/solrcloud/FileShare/lib/custom and to add read permissions on the jar file to the ‘datafari’ user.

Then you will need to tell the core which UpdateProcessorFactory to use, along with the parameter “source.field” that the processor will use and when to use this update processor. Here again, things are simplified in Datafari as you will only need to declare the update processor in the DATAFARI_HOME/solr/solrcloud/FileShare/conf/customs_solrconfig/custom_update_processors.incl file as followfollows:

Code Block
<processor class="com.francelabs.datafari.updateprocessor.ReplaceUrlUpdateProcessorFactory">
    <str name="source.field">testurl</str>
</processor>

Datafari is configured to call each custom update processor factories factory specified in this file (in the order they are declared) at the very end of the update processors chain. So it This guarantees that what will do the update processors will not be override by a Datafari core code !Once that your custom update processors are actually executed, once all of the actions from the Datafari core code have already been executed.

Once this is done, you will need to restart Datafari (or simply the /wiki/spaces/DATAFARI/pages/2852716547), then push and apply the new configuration thanks to the System Configuration Manager (Zookeeper) .

Then voilà, on the next crawl, every indexed document will see have their ‘url’ replaced by the value of the field specified by the source.field parameter (if that field exists in the document)

So by Through this example, you should have understood the bases : how to use parameters for an update processor, how to use it and how works a custom update processor works. You can now use the example update processor to develop your own.

Note

If for any reason you have to maintain a previous version of your Update Processor, do not keep the jar in the same folder just by changing it's extension, for example "my_update_proc.jar.old". It will be class loaded as any jar by the Solr Server, no matter the extension. At run time the result will be unpredictable as the previous version of your classes can be used instead of the new one.

For other collections except FileShare

Info

Normally, it is very rare to need to do modifications on the update processors not related to the FileShare collection.

If you use Enterprise Edition, please contact Datafari support team to explain your needs.

The main steps are :

  • Compile your custom processor and export the jar (see above)

  • Put the jar into the folder $DATAFARI_HOME/solr/solrcloud/$COLLECTION/lib (replace $COLLECTION by the collection in which you want to put your update processor. Create the lib folder if it does not exist)

  • Edit the solrconfig.xml file of the collection :

--Add the lib tag to load the jar :

Code Block
<lib dir="$DATAFARI_HOME/solr/lib/custom"/>

(replace $DATAFARI_HOME by the real path of your Datafari installation path, by default /opt/datafari )

--Modify the /update handler to add the update processor, example here with noip-processor-chain :

Code Block
<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
      <str name="update.chain">noip-processor-chain</str>
    </lst>
  </requestHandler>

--Add also the update request processor chain :

Code Block
<updateRequestProcessorChain name="noip-processor-chain">
    <processor class="com.francelabs.datafari.updateprocessor.AnonymiseIpUpdateProcessorFactory" />
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>
  • Push the configuration to ZK :

Code Block
bin/solr zk upconfig -n $COLLECTION -d $DATAFARI_HOME/solrcloud/$COLLECTION/conf/

(replace $DATAFARI_HOME by the real path of your Datafari installation path, by default /opt/datafari and replace $COLLECTION by the Solr collection )

  • Reload the collection :

Code Block
curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=$COLLECTION