Restart MCF agent if a job is stuck

Valid from Datafari X.X

Note: Datafari EE users should rather use this automatic functionnality: https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2693136385

Sometimes MCF can have one or multiple jobs that are stuck. We define it as follows: either the job status is running but there has not been any new documents indexed since a long time (relatively to the expected speed of the given job), or the job status is aborting or stopping but stays in this status stuck for hours or days (again, relatively to the expected speed of shutdown of the job).

We can force MCF to unfreeze the job and give us back the control (to restart it for example).

The process is :

  • We have to establish a SSH connection into the Datafari server

  • Enter the commands :

    cd /opt/datafari/bin su datafari -c "bash datafari-manager.sh stop_mcf_crawler_agent"

When the command is done, check that the MCF agent is properly stopped via the command :

ps aux | grep manifoldcf.processid

If the result is similar to the one on this screenshot:

It means that we need to kill the process with the following command : (you need to replace the processus_id variable by the id that you have in the red rectangle on the screenshot above. It is the first number at the right of ‘datafari’) 

kill -9 [processus_id]

Then you have to restart the MCF agent with this command :

Finally you have to wait (for about one hour or more) and see if the status is still stuck into the MCF admin UI or if it has changed. If it is still stuck, contact Datafari support team if you have Datafari EE. Otherwise, you need to reinstall Datafari.