Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Valid up to 2.x

Valid up to Datafari 2.x . May work for the later versions.

(warning) The date of LibreOffice documents is not crawled correctly with the local file system connector.
We detail here how to kick off quickly in Datafari with a crawling of a local file folder. YET for security reasons, it is NOT recommended to use it in production environment.

Info

Local and shared files crawling

(warning) The local file crawler is a DEMO connector that musn't be used for production environments because of security concerns. 

With the local file crawler, Date metadata is not extracted from plain text documents.

A recommended alternative to the local file crawler is the "Windows share" (a.k.a. JCIFS connector), that may be used to crawl files in a file shared directory (it works under Linux as well (wink) ). You can check its official documentation on the Apache ManifoldCF version 2.5 website.
Documentation on how to setup a samba share on Linux here

Look at Add the JCIFS Connector to Datafari - Community Edition [DEPRECATED] to learn how to use the recommended JCIFS connector.

...

A list of potential regex appears on the right. Here is some info about the regex, taken from the manifoldcf 2.5 documentation:

Tip

Using the regex option

For each included path, a list of rules is displayed which determines what folders and documents get included with the job. These rules will be evaluated from top to bottom, in order. Whichever rule first matches a given path is the one that will be used for that path.

Each rule describes the path matching criteria. This consists of the file specification (e.g. "*.txt"), whether the path is a file or folder name, and whether a file is considered indexable or not by the output connection. The rule also describes the action to take should the rule be matched: include or exclude. The file specification character "*" is a wildcard which matches zero or more characters, while the character "?" matches exactly one character. All other characters must match exactly.

Remember that your specification must match all characters included in the file's path. That includes all path separator characters ("/"). The path you must match always begins with an initial path separator. Thus, if you want to exclude the file "foo.txt" at the root level, your exclude rule must match "/foo.txt".

To add a rule for a starting path, select the desired values of all the pulldowns, type in the desired file criteria, and click the "Add" button. You may also insert a new rule above any existing rule, by using one of the "Insert" buttons.

...