Info |
---|
This tutorial is based on Datafari 5.2, but the same procedure can be applied to more recent versions. |
Starting from Solr 9, Solr does not contain anymore the Data Import Handler (DIH) package anymore. Furthermore, as of July 2022, no one really committed to maintain and update it regularly. Yet it had a large users base, who is are now looking for alternatives.
...
There are three main steps :
Download and install Datafari
(optional - not needed for PostgreSQL) Add the JDBC driver that corresponds to your database (we do not have the right to include it in Datafari due to licence issues for MariaDB or MySQL for example) unless you are crawling a PostgreSQL database, in which case the JDBC driver is already included
NB : The next version of Datafari (5.3) will include by defaut the JDBC driver for Microsoft SQL server and Oracle server.Create your crawl job
Right after these steps, you can begin to search into your data !
...
Connect into SSH to your server
Download latest stable version of Datafari :
Code Block wget https://www.datafari.com/files/debian/datafari.deb
To install the dependencies of Datafari, download our convenient script to install them automatically :
Code Block wget https://www.datafari.com/files/scripts_init_datafari/init_server_datafari_5_debian_10_plus.sh
Now execute the init script :
Code Block source init_server_datafari_5_debian_10_plus.sh
Install Datafari :
Code Block dpkg -i datafari.deb
init Datafari
Code Block cd /opt/datafari/bin bash init-datafari.sh
And VoilĂ ! Datafari is installed and functional. You can connect to https://$IP_OF_YOUR_DATAFARI_SERVER/datafariui
In our example : https://51.158.69.126/datafariui
For more information see this page : Install Datafari - Community Edition
...
Code Block |
---|
cd /opt/datafari/mcf/mcf_home/connector-lib-proprietary wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.29/mysql-connector-java-8.0.29.jar chmod 775 /opt/datafari/mcf/mcf_home/connector-lib-proprietary/mysql* chown datafari /opt/datafari/mcf/mcf_home/connector-lib-proprietary/mysql* cp /opt/datafari/mcf/mcf_home/connector-lib-proprietary/mysql* /opt/datafari/tomcat-mcf/lib/ chmod 775 /opt/datafari/tomcat-mcf/lib/mysql* chown datafari /opt/datafari/tomcat-mcf/lib/mysql* |
Edit the file
/opt/datafari/mcf/mcf_home/options.env.unix
Code Block |
---|
nano /opt/datafari/mcf/mcf_home/options.env.unix |
Add the path to the new lib in the -cp
parameter line :
Code Block |
---|
connector-lib-proprietary/mysql-connector-java-8.0.29.jar |
...
For more information, see this page : Connector - Add a JDBC connector (MySQL, Oracle, etc)
Restart Datafari - You have 2 options to do it :
Option 1 - Via the Datafari admin UI : Go to the main server of Datafari, then click on Services Administration and Restart.
Option 2 - By restarting Datafari via SSH :
...
Database type : MySQL
Database host : 163.172.184.196
Database name : wiki
user : root
Password : admin
Seeding query :
Code Block SELECT page_id AS $(IDCOLUMN) FROM page
Version query
Code Block SELECT page_id AS $(IDCOLUMN), page_id AS $(VERSIONCOLUMN) FROM page WHERE page_id IN $(IDLIST)
Data query
Code Block SELECT page_id AS $(IDCOLUMN), page_id AS $(URLCOLUMN), page_title AS $(DATACOLUMN) FROM page WHERE page_id IN $(IDLIST)
Source name : db
Repository name : msqylrepo
Start he job once created : check the box
Finally click on Save button
Then you can check if all is ok in your MCF : https://$IP_OF_YOUR_DATAFARI_SERVER/datafari-mcf-crawler-ui/
In our example : https://51.158.69.126/datafari-mcf-crawler-ui/
...
Finally go to Datafari and search your data (optional, you do not need DatafariUI, you can do as you were doing before when you were combining DIH and Solr) :
Go to https://$IP_OF_DATAFARI_SERVER/datafariui
In our example : https://51.158.69.126/datafariui
...
Connect into SSH into the instance
Get MySQL Server 8 :
Code Block apt update wget https://dev.mysql.com/get/mysql-apt-config_0.8.22-1_all.deb apt install ./mysql-apt-config_0.8.22-1_all.deb apt update apt install mysql-server
Check if MySQL is well started :
Code Block service mysql status
Get the SQL dump of English Wikipedia pages :
Code Block wget http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
Uncompress it :
Code Block gzip -d enwiki-latest-page.sql.gz
Create the database and change encoding :
Code Block mysql -uroot -p CREATE DATABASE wiki; USE wiki; ALTER DATABASE wiki CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
...
Import the data into the database :
Code Block mysql -u root -p wiki_en < enwiki-latest-page.sql
Change the configuration into MySQL to allow remote connection, to do so edit the file /etc/mysql/mysql.conf.d/mysqld.cnf :
Code Block nano /etc/mysql/mysql.conf.d/mysqld.cnf
...
Create a new user into the database :
Code Block mysql -u root -p CREATE USER 'datafari'@'51.158.69.126' IDENTIFIED BY 'admin'; GRANT ALL PRIVILEGES ON * . * TO 'datafari'@'51.158.69.126';
In this example, the name of the user is datafari and the password is admin. We allow datafari user to connect to MySQL database from the location of our Datafari server : 51.158.69.126
. We granted all privileges to datafari user, once again it is just for demo purpose.