Page Comparison

...

If you want to crawl only webpages (without files), you can customize the part of the page to be indexed. This configuration is done for you.

Starting from Datafari X3.X 2 version, we add the possibility to crawl easily a website into Datafari.

...

3) Go to MCF and launch the job
Finally you can go back to the MCF admin page and then click on Status dans Job management and then on the button start front of your new job.
You can after that go to to the search page and see your new Solr documents :

Be aware that this handler only works for webpages, if you want to index documents like PDF documents, Microsoft Office documents, etc... you have to add an additional job with an other Web repository connector and to choose the standard Output connector. With this configuration Tika will be used to extract the content of the documents and add the associated metadata. In the job configuration, you have to exclude the webpages that are indexed by your "JSOUP job".

Versions Compared

Old Version 2

New Version 3

Key