Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The only parameter that is really important and that matters, is the “Database raw connection string”. The CSV JDBC driver must connect to a local folder (local on the machine where the job using the CSV driver will run), containing CSV files you want to crawl. So the “Database raw connection string” must be set with the absolute path of the local folder containing the CSV files to crawl, for example ‘/home/francelabs/csv’. The specified folder and all the files it contains must also have read permissions for the user running the MCF instance, which is ‘datafari’ by default on a standard Datafari installation. We recommend the files to be at the root of the specified folder, we have not tested the behavior with subfolders.

2. CSV column separator

Currently, the separator used for columns in the CSV files is not configurable, IT MUST BE a comma char: ,

3. Queries

Concerning the queries, there are some things to understand in order to build working ones.

...

And assuming the persons.csv file contains a column named ‘id’ and you want to build a query that selects this column. Then you will build the following query:

Code Block
languagesql
SELECT id FROM persons;

As long as the persons.csv file is present in the folder configured as database connection and that the id column is referenced within the file, it will work.

...

Last, if you set labels for columns in your query, the driver will use the labels for the WHERE clause. So if you label a column with another name, and want to set a WHERE clause on the same column, you will need to use its label in the WHERE clause. For example, if we take the previous CSV file content and want to build a query that use the label ‘doc_id’ for the ‘id’ column and define a WHERE clause to only get the id which is equal to '1', then we will build it like this:

Code Block
languagesql
SELECT id as doc_id WHERE doc_id=1;

...

Assuming we have a table named “documenttable” with “idfield”, “urlfield” and “datafield” as columns, this data query will not work as it is ! because the “idfield” is labeled $(IDCOLUMN) and the WHERE clause refers to “idfield”, not $(IDCOLUMN). This query will cause the following type of error during execution:

Code Block
languagejava
Caused by: java.sql.SQLException: Invalid column name: idfield

...