...
Database type: Type of Database to connect to. If your database is not listed here then you will not be able to crawl it. In that case you can check if there is a specific connector for your database, if not, you will have to open an issue on our gitlab to ask for an update to handle it !
Database host: Database host, you must specify the port if it is not the default one, when the port is specified, it must be separated from the host by a ‘:' char. For example, if you want to crawl a MySQL database that is running locally on its default 3306 port, then you only need to specify ‘localhost’ here. If you have changed the default port of the MySQL database to 0690 (as an example), then you will need to set this parameter to 'localhost:0690’
Database name: Name of the Database to connect to by default
Database raw connection string: This parameter is optional and corresponds to a raw connection string that you may want to use to connect to your database. For example: "Server=127.0.0.1;Database=test;User Id=foo; Password=bar"
User: Username that can connect to the Database, can perform all the queries without restrictions and having full read access to the data you want to crawl
Password: Password of the provided user
Seeding query: Query that must return the IDs of all documents to crawl. The query must necessarily return and map the id field to the $(IDCOLUMN) parameter. The $(IDCOLUMN) is used by MCF to identify the id field and gather all the IDs of documents. As an helper you can take example of the demo query provided: “SELECT idfield AS $(IDCOLUMN) FROM documenttable WHERE modifydatefield > $(STARTTIME) AND modifydatefield <= $(ENDTIME)”. The $(STARTTIME) and $(ENDTIME) mapping parameters are optional and are used for delta crawl. You can retrieve more details about the seeding query and the delta crawl in this documentation: DB Connector Best Practices. The IDs gathered in this query will then be used in the other queries through the $(IDLIST) parameter to limit the results.
Version query: This query must be built so it returns a hash/value field representing the version of a document for each document IDs coming from the seeding query. The version filed must be mapped to the $(VERSIONCOLUMN) parameter. The $(VERSIONCOLUMN) is used by MCF to identify and gather the version of all document IDs that come from the ‘Seeding query’. The id field must also be returned and mapped to the $(IDCOLUMN) parameter so that MCF can identify the IDs. As an helper you can take example of the demo query provided: “SELECT idfield AS $(IDCOLUMN), versionfield AS $(VERSIONCOLUMN) FROM documenttable WHERE idfield IN $(IDLIST)”. The $(IDLIST) parameter is mandatory, it is replaced by MCF by the list of all IDs coming from the seeding query (not all IDs at once, the query is executed several times with different sets of IDs so that the results are somehow ‘paginated’). To have more explanations on the Version query role during delta crawls please consult this documentation: DB Connector Best Practices
Access Token query: Query that must return a field representing an access token of a document. An access token is a string value that is representing a right/group allowed to access to the document. The access token field must be mapped to the $(TOKENCOLUMN) parameter so that MCF can identify it. The id field must also be returned and mapped to the $(IDCOLUMN) parameter so that MCF can identify the IDs. As an helper you can take example of the demo query provided: “SELECT docidfield AS $(IDCOLUMN), aclfield AS $(TOKENCOLUMN) FROM acltable WHERE docidfield IN $(IDLIST)”. As for the Version query, the $(IDLIST) parameter is used to limit and ‘paginate’ results. This query is optional because it will not prevent the crawl to work without it but is MANDATORY when you want to implement the security for the crawled documents. When the security is activated, during a search, the access tokens of documents will be compared with the access tokens of the user performing the search, and only documents having at least one access token matching with the access tokens list of the user will be displayed in the search results. To have more explanations about this query and how works the security with database documents, please refer to this documentation: Valid only for Datafari EE - see /wiki/spaces/DATAFARI/pages/27135508633063480322
Data query: Query that must return the main content field of documents. The content type must be either binary or text and the content field must be mapped to the $(DATACOLUMN) parameter so that MCF will be able to identify it and retrieve the content for each document. The query must also necessarily return the id field mapped to the $(IDCOLUMN) parameter, and a field representing the unique URL of a document and map this field to the $(URLCOLUMN) parameter. As an helper you can take example of the demo query provided: “SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN), datafield AS $(DATACOLUMN) FROM documenttable WHERE idfield IN $(IDLIST)”. As for the previous queries, the $(IDLIST) parameter is used to limit and ‘paginate’ results
Source name: The provided source name will be used as a facet to enable result filtering by source/job
Repository name: The provided repository name will be used to name the repository connector and job in MCF. This name MUST be unique, if another repository has the same name, the job creation will fail !
Security: Enable ACL respect for database documents. Requires that the Token query is set AND that a corresponding authority connector is configured (refer to this documentation: /wiki/spaces/DATAFARI/pages/2713550863 ). If this box is not checked, all the crawled documents will be publicly accessible, so be careful !
Start the job once created: By enabling this option, once created, the job will be automatically started and the crawl will begin.
...