...
The only parameter that is really important and that matters, is the “Database raw connection string”. The CSV JDBC driver must connect to a local folder (local on the machine where the job using the CSV driver will run), containing CSV files you want to crawl. So the “Database raw connection string” must be set with the absolute path of the local folder containing the CSV files to crawl !.
2. Queries
Concerning the queries, there are some things to understand in order to build working ones.
First, the tables names correspond to the CSV files names in the folder you configured in the database connection. For example, if you want to build a query that select selects the ‘id’ column of the CSV file named ‘persons.csv’ then you will build the following query:
...
As long as the persons.csv file is present in the folder configured as database connection and that the id column is referenced within the file, it will work !.
Next, obviously, the files you want to perform queries on must be readable for the user that is running the MCF instance that runs the job. By default in Datafari it is the ‘datafari’ user !.
Third, each CSV file that you want to query MUST contain as first line the column descriptions like this:
...
The column description line will be used by the driver to determine the columns names and you will be able to use them in your queries !.
Last but not least, if you set labels for columns, the driver will use the labels for the WHERE clause ! . So if you label a column with another name, and want to set a WHERE clause on the same column, you will need to use its label in the WHERE clause. For example, if we take the previous CSV file content and want to build a query that use the label ‘doc_id’ for the ‘id’ column and define a WHERE clause to only get the id which is equal to '1', then we will build it like this:
...
Notice that we directly used the ‘doc_id’ label in the WHERE clause !.