Manually adding facets to Ajaxfrancelabs

In this tutorial, we will see how to add a facet in Ajaxfrancelabs. The facet here will be a facet field, that means that the source of the faceting are the different values of the Solr field.

Indexation

We distinguish two cases here :

  • you want to add some metadata from you ManifoldCF job

  • or you already modified your schema and you indexed a custom Solr field

Remark : in the tutorial we write the absolute path for Debian version wich starts by /opt/datafari but it is the same for Windows paths : just replace the beginning of the path by your local installation of Datafari

Add a metadata value in a MCF job

In our example, let's say that we want to have a facet with the source of the ManifoldCF job. Here I want to index 2 websites with ManifoldCF : francelabs.com and datafari.com. I want to propose to the users to have a facet with the origin of the document : francelabs.com or datafari.com.

Let's do this !

  • Add the custom field to the Solr schema 

Edit custom_fields.incl located in /opt/datafari/solr/solr_home/FileShare/conf/customs_schema and add your new field in JSON format. It needs to be at least indexed (not necessarily stored) and not tokenized so if I want to add the field named job, the configuration will be :

{ "name":"job", "type":"string", "stored":true, "multiValued":false }

Don't forget to save your changes into the file.

  • Launch the script addCustomSchemaInfo.sh

Launch the script addCustomSchemaInfo.sh located into /opt/datafari/solr/solr_home/FileShare/conf/customs_schema :

cd /opt/datafari/solr/solr_home/FileShare/conf/customs_schema bash addCustomSchemaInfo.sh

The modifications will be directly applied thanks to the Solr schema API.

note that you can also do it directly by the Solr Admin UI BUT keep in mind that the modifications done will be added in the managed-schema file. If you upgrade Datafari in the future your modifications will be lost.

To do so :

Access the Datafari Admin UI > Search Engine Administration > Solr Administration > collection = FileShare > schema > add field > name = "job" (for this example) > field type = "string" > stored = TRUE > indexed = TRUE

After that, verify that the new field is present by selecting "index fields" in the admin UI > Search Engine Administration > Index Fields > Scroll down list of fields to confirm that "job" is present.

  • So now we can configure ManifoldCF. First thing is obviously to configure the repository connection and the job configuration.

I invite you to go to this section of the wiki if you are not familiar with this step : Crawling

Here we add a Web repository connection and then added a FranceLabswebsite job in which we configured the seeds tab with the url of the France Labs website.

We did the same thing for Datafari website (seed: datafari.com) :

  • So for now we have our 2 ManifoldCF (MCF) jobs well configured. We need to indicate to ManifoldCF to add a particular metadata for each job to distinguish the documents. In order to to that, we need to configure a Transformations connector.
    Click on the right menu on List transformations connections in the Output sections.

Then add a name and click on the Type tab. Select Metadata adjuster on the dropdown list then click save.

  • Now go back to the job configuration. Click on List all jobs on the Jobs section. Then click on the edit link for the "FranceLabsWebsite" job.

Go to the Connections tab and click on the dropdown select list in front of the Transformation line then select the new transformer freshly created. Then click on the button named  "Insert transformation before".

So you should have the following configuration :

  • New tabs appeared called Metadata, Move metadata and Add metadata. Click on Add metadata.
    Into parameter name you have to write the same name that the field you configured above so in our case it is named job. For the parameter value, I wrote FranceLabs.
    Then click on save. 

The global configuration of the FranceLabs job is this one now :

We do the same thing for the Datafari job, the parameter name is job too and the value is Datafari.

  • We can now launch the crawl job for the two jobs. Go to the Jobs section and click on "Status and Jobs management". Than click on Start for the two jobs.

We have now to wait a little for some documents to be indexed.

  • We are going to check if all is OK in the Solr Administration interface. Click on Search engine administration then on Solr administration in the right menu.

In the main window, select now FIleShare in the dropdown select list then click on Query in the tabs list below.

Then click on the blue button Execute Query : it will launch a default Solr query : we search anything on all the Solr corpus. We should obtain :

So we notice that in the fields list, we have the field job with the value FranceLabs. It is exactly what we expected : we have a new field with a different value fo each job configured in MCF.

We can now configure the Datafari User Interface !

Field already added to the Datafari schema

If you have already modified your indexation code and added your metadata, you have to respect these requirements for the Solr field :

 <field name="job" type="string" indexed="true" stored="true" />

 The field needs to be indexed and not tokenized. You can now configure the Datafari User Interface.

Configure Ajaxfrancelabs

Our field is correctly present in our Solr schema and the values are present for each document. We just now have to configure the faceting in Ajaxfrancelabs.

We need to modify the search.js file in order to add the TableFacet widget, we also have to add the facet display in the searchView.jsp and finally add the label of the facet in the i18n files : en.json and fr.json.

  • First, edit the file search.js into /opt/datafari/tomcat/webapps/Datafari/js, add the widget in the code :

We need to have an unique identifier in elm and id, we choose to call it facet_job, in the field parameter we indicate job, for the name we put job.

The file looks like this :

  • Edit now the file searchView.jsp located in /opt/datafari/tomcat/webapps/Datafari

We add the facet element to display on the page.We add it on the div section called col left :

The order is important, by default we have the facets in this order : by date, type and source. If I want to the job facet just after the source facet, the file seems like that :

  • Ok now the final step is to localize the label name of our new facet. Let's edit the two files en.json and fr.json in /opt/datafari/tomcat/webapps/Datafari/js/AjaxFranceLabs/locale :

We add a new line for the parameter 'job'. Here we add the same value for both the languages : Job. So the line to add is :

Screenshot of the fr.json file :

  • The configuration is now over. We can check if our new facet is correctly displayed in Datafari :

It seems to be the case. We can now filter the results by the ManifoldCF job : documents from Datafari website or France Labs website !