Info | ||
---|---|---|
| ||
The documentation below is valid from Datafari v4.0.0 upwards |
...
Here are some further examples for the regex to include/exclude pages, in case you need inspiration
Info | ||
---|---|---|
| ||
The following regex will allow you to exclude at fetch time all the urls beyond a certain depth, to be set in the exclusions tab of your job: (/[^\/]+){6}/?.* (change 6 with the integer of your choice, the higher the deeper the crawl will be in terms of slashes in the url) This regex has to be interpreted with a shift of 2 from the number X declared here (/[^\/]+){X}/?.* , for instance:
|
In the screenshot below, we specify that we do not want URL with CSS extension and we exclude from index all the URL which contain in their path the layout directory or vti bin or SiteAssets. The difference between exclude from crawl and exclude from index is that for example if ManifoldCF finds an URL with the layouts folder it will be crawled, then MCF will search into it if it contains other URL to be fetched but this URL will not be included in the Solr index.
...