CSV Connector

Valid from Datafari v6.0

The CSV connector lets you index each lines of a CSV file as a Solr document.

  1. Create the repository connector

Go the MCF Admin Page and in the “Main Navigation”, choose “List Repository Connections” and add a new connection:

Choose a name then select CSV in “Type“ tab:

And that's it, your Repository connector is created with default values, as you can see in the following screenshot :

 

Note: the Authority Group is currently not actually being used (as of October 2023)

  1. Create the corresponding CSV job with at least the following info (as seen in the screenshot below):

In the connection Pipeline (as seen in the screenshot above), you do not need to add a Tika connector, since a csv file is a simple texte format. You do required at least the Repository in stage 1, and the Output in stage 2.

For the CSV file paths parameter (as seen in the screenshot above), it is necessary to specify the CSV file names to be used. The syntax required is that of local files, so if you need to access remote files, you can use mount the foldercontaining your CSV files. Here is an example for a remote file exposed via SMB, mounted locally : /mounted_remote_smb_share/folder1/…/foldern/filename.csv

For the Separator character parameter, insert the separator you want to use. By default, it is “,”. Note that works with multiple characters, but we have not tested it with “typical” escape characters such as “\”.

Note: this connector does not handle cases where the csv content contains the same character used as the separator character, so you would need to do some cleanup upfront.

For the Content Column Label parameter, it is used to map the solr field Content to any column of your CSV file. So if your CSV file contains a column with name “mon_contenu”, and you want to map it to the Content solr field, put “mon_contenu” on this parameter.

For the Id Column Label parameter, it is used to map the solr field Id to any column of your CSV file. So if your CSV file contains a column with name “mon_id”, and you want to map it to the id solr field, put “mon_id” on this parameter.

You cannot graphically do any other mapping between a CSV column name, and a solr field. You must have an exact matching between an existing Datafari Solr Field name, and a CSV column name. As a consequence, CSV column names that do not correspond to any Solr field will not be used. Note the matching is case sensitive.