Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

One of the most interesting feature of Solr is that you can develop custom Update Processors. These components are used (and useful) to perform data adjustments/modifications on documents just before they are indexed.

To develop your own update processor, you will need to create a simple java project with at least two dependencies : solrj and solr-core. Please check the Solr version of your Datafari in order to use the proper dependencies versions.

To make it easy for you and for the example, we have prepared a very simple Update Processor Gitlab project that you can use as template or inspiration to develop your own.

For now, let us use this project to explain the bases:

An Update Processor is composed of two elements :

  • An “UpdateProcessorFactory” that will tell Solr how to instantiate the update processor. It is this component that is aware of the parameters specified in the configuration (that we will see further in this doc)

  • The Update Processor itself

In the example project, The “ReplaceUrlUpdateProcessorFactory” simply retrieves the parameters specified in the configuration (if any) and passes them on to the “ReplaceUrlUpdateProcessor” constructor. The “ReplaceUrlUpdateProcessor” searches for a field named as specified by the ‘source.field’ parameter (if any), if it exists, its value is extracted and will override the value of the field ‘url’. This algorithm is performed for each document about to be indexed.

Now let us see how to declare and use a custom update processor:

Each Solr core manages its own java libraries. Therefore, in order to use an Update Processor in a specific core, you will need first to add the Custom Update Processor classes to the classpath of the target core. By compiling the example project, you will obtain a jar file named “CustomUpdateProcessor-0.0.1-SNAPSHOT.jar”. Hopefully for you, Datafari is designed to facilitate the implementation of custom update processors in its main core ‘FileShare’. So you will only need to put the jar into DATAFARI_HOME/solr/solrcloud/FileShare/lib/custom and to add read permissions on the jar file to the ‘datafari’ user.

Then you will need to tell the core which UpdateProcessorFactory to use, along with the parameter “source.field” that the processor will use and when to use this update processor. Here again, things are simplified in Datafari as you will only need to declare the update processor in the DATAFARI_HOME/solr/solrcloud/FileShare/conf/customs_solrconfig/custom_update_processors.incl file as follow:

<processor class="com.francelabs.datafari.updateprocessor.ReplaceUrlUpdateProcessorFactory">
    <str name="source.field">testurl</str>
</processor>

Datafari is configured to call each custom update processor factories specified in this file (in the order they are declared) at the very end of the update processors chain. This guarantees that your custom update processors are actually executed, once all of the actions from the Datafari core code have already been executed.

Once this is done, you will need to restart Datafari, then push and apply the new configuration thanks to the System Configuration Manager (Zookeeper) .

Then voilà, on the next crawl, every indexed document will have their ‘url’ replaced by the value of the field specified by the source.field parameter (if that field exists in the document)

Through this example, you should have understood the bases : how to use parameters for an update processor, how to use and how works a custom update processor. You can now use the example update processor to develop your own.

  • No labels