Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Valid from Datafari v6 onwards

This documentation is valid from Datafari v6 onwardsThis was a test connector, replaced by a native csv connector without the need for jdbc

The CSV JDBC driver for the JDBC connector differs a little bit from the other drivers in the way it should be configured to properly work. The native CSV connector is documented: CSV Connector

As a reminder, the proper JDBC CSV driver to use is the following: https://github.com/jprante/jdbc-driver-csv

1. Database connection

Unlike the other drivers, it does not require a database host and/or port, a database name, a user and a password, even if, unfortunately, those parameters are mandatory for the JDBC connector. So you must set something for those parameters anyway, whatever you want BUT an empty string !

The only parameter that is really important and that matters, is the “Database raw connection string”. The CSV JDBC driver must connect to a local folder (local on the machine where the job using the CSV driver will run), containing CSV files you want to crawl. So the “Database raw connection string” must be set with the absolute path of the local folder containing the CSV files to crawl, for example ‘/home/francelabs/csv’. The specified folder and all the files it contains must also have read permissions for the user running the MCF instance, which is ‘datafari’ by default on a standard Datafari installation. We recommend the files to be at the root of the specified folder, we have not tested the behavior with subfolders.

2. CSV column separator

Currently, the separator used for columns in the CSV files is not configurable, IT MUST BE a comma char: ,

3. Queries

Concerning the queries, there are some things to understand in order to build working ones.

...

And assuming the persons.csv file contains a column named ‘id’ and you want to build a query that selects this column. Then you will build the following query:

Code Block
languagesql
SELECT id FROM persons;

As long as the persons.csv file is present in the folder configured as database connection and that the id column is referenced within the file, it will work.

...

Last, if you set labels for columns in your query, the driver will use the labels for the WHERE clause. So if you label a column with another name, and want to set a WHERE clause on the same column, you will need to use its label in the WHERE clause. For example, if we take the previous CSV file content and want to build a query that use the label ‘doc_id’ for the ‘id’ column and define a WHERE clause to only get the id which is equal to '1', then we will build it like this:

Code Block
languagesql
SELECT id as doc_id WHERE doc_id=1;

Notice that we directly used the ‘doc_id’ label in the WHERE clause.

Labels are necessary for the 4 types of queries needed when configuring your connector, please check the section about these queries in Database crawling simplified configuration documentation.This is really important, specifically when you need to use MCF variables as labels in the different types of queries specified in the Database crawling simplified configuration documentation ! The Version query, Access Token query and Data query are affected by this behavior as they use the “idfield” in the where clause !

For example, here is the default data query:

Code Block
languagesql
SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN), datafield AS $(DATACOLUMN) FROM documenttable WHERE idfield IN $(IDLIST)

Assuming we have a table named “documenttable” with “idfield”, “urlfield” and “datafield” as columns, this data query will not work as it is ! because the “idfield” is labeled $(IDCOLUMN) and the WHERE clause refers to “idfield”, not $(IDCOLUMN). This query will cause the following type of error during execution:

Code Block
languagejava
Caused by: java.sql.SQLException: Invalid column name: idfield

Instead, the correct data query would be:

Code Block
languagesql
SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN), datafield AS $(DATACOLUMN) FROM documenttable WHERE $(IDCOLUMN) IN $(IDLIST)