When creating a DB job in the ManifoldCF interface, four sql queries are expected by the job as part of the parameters. We detail two of them below because they may be challenging to configure :
Seeding query
When defining the "Seeding query" in a job involving a JDBC connector, MCF introduces two useful variables to deal with delta crawls:
- $STARTTIME : Which represents the timestamp in MILLISECONDS of the start time of the last FINISHED run of the job. It means that the first time you will run the job, this variable will be equal to 0 and will have a value only once the job has run (and is finished with a "done" status) at least once.
- $ENDTIME : Which represents the timestamp in MILLISECONDS of the end time of the last FINISHED run of the job. It means that the first time you will run the job, this variable will be equal to 0 and will have a value only once the job has run (and is finished with a "done" status) at least once.
Timestamp formats
Usually, these variables are very useful to compare them to the timestamp of the last modified date of a document. But be careful, the timestamp provided is expressed in milliseconds and not in seconds like a standard Unix timestamp ! Keep it in mind because in most cases you will need to adjust either the timestamp on the Database side or divide the timestamp provided by MCF by 1000 to convert it into a Unix format !
Version check query
This query is very important because it will generate a VERSIONCOLUMN variable for each crawled document that will be associated to the doc ID and will be used by MCF to determine during a delta crawl if the document needs to be re-indexed or not. Basically the best practice is to bind the VERSIONCOLUMN to a table column of a document that is updated each time a modification is performed on the document. If you cannot have such a column in your DB schema, what we advise is to create a function on Database side that will generate a hash for a document, based on each column value. With such a function, if any of the values of a DB document changes, the hash will also change. Now if you set this DB function as the VERSIONCOLUMN (via a standard SQL function call), MCF will know the document has to be re-indexed.