Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Valid from Datafari v6 onwards

This documentation is valid from Datafari v6 onwards

The CSV JDBC driver for the JDBC connector differs a little bit from the other drivers in the way it should be configured to properly work

1. Database connection

Unlike the other drivers, it does not require a database host and/or port, a database name, a user and a password, even if, unfortunately, those parameters are mandatory for the JDBC connector. So you must set something for those parameters anyway, whatever you want BUT an empty string !

The only parameter that is really important and that matters, is the “Database raw connection string”. The CSV JDBC driver must connect to a local folder (local on the machine where the job using the CSV driver will run), containing CSV files you want to crawl. So the “Database raw connection string” must be set with the absolute path of the local folder containing the CSV files to crawl.

2. Queries

Concerning the queries, there are some things to understand in order to build working ones.

First, the tables names correspond to the CSV files names in the folder you configured in the database connection. For example, if you want to build a query that selects the ‘id’ column of the CSV file named ‘persons.csv’ then you will build the following query:

SELECT id FROM persons;

As long as the persons.csv file is present in the folder configured as database connection and that the id column is referenced within the file, it will work.

Next, obviously, the files you want to perform queries on must be readable for the user that is running the MCF instance that runs the job. By default in Datafari it is the ‘datafari’ user.

Third, each CSV file that you want to query MUST contain as first line the column descriptions like this:

id,title,content,url
1,title 1,This is the content of document 1,/home/francelabs/test_csv/test.csv
2,title 2,This is the content of document 2,/home/francelabs/test_csv/test.csv

The column description line will be used by the driver to determine the columns names and you will be able to use them in your queries.

Last, if you set labels for columns, the driver will use the labels for the WHERE clause. So if you label a column with another name, and want to set a WHERE clause on the same column, you will need to use its label in the WHERE clause. For example, if we take the previous CSV file content and want to build a query that use the label ‘doc_id’ for the ‘id’ column and define a WHERE clause to only get the id which is equal to '1', then we will build it like this:

SELECT id as doc_id WHERE doc_id=1;

Notice that we directly used the ‘doc_id’ label in the WHERE clause.

  • No labels