Force a MCF job to reindex all documents without deleting them from the current index

Valid as of Datafari 4.6

There are scenarios that require to force a job to reindex all of its documents into MCF. Note that doing this does not delete the documents from the index, so users can still search them.

As an example of such scenarios, take a filer job: normally MCF compares a document that it is stored in its internal database before fetching the content of the file. If they are the same, the document is not fetched because it is identical. But if we modify the job itself, for instance if we add a metadata, this has no impact on the documents themselves, and therefore MCF will not update the index.

Some jobs do not do such comparisons of document (for instance the web job, that does a full indexation every time, because it cannot compare a webpage to crawl with a webpage already), and for such jobs this MCF reindexing process is not useful.

For other types of jobs, if you are unsure about the behavior of the crawler, you can apply this process.

The process is in 2 steps :

  • Go to the MCF admin UI then into Jobs → List all jobs and click into ‘View’ in front of the job that will have its documents reindexed

At the end of the page of the job click on the yellow button : “Reset seeding”

  • Go to Outputs → List Output Connections then click on the View button in front of the output : DatafariSolrNoTika

    Then click on the yellow button named : “Remove all associated records”

You can now start your job and all the documents will be indexed as if it was the first time that you launched the job.