Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Version published after converting to the new editor




Info

Starting from Datafari 5.1

...

  • Add the JAR datafari-html-extractor-connector-4.0.2-SNAPSHOT in the connector-lib folder of MCF ie $MCF/connector-lib
  • Edit the file $MC/mcf_home/connectors.xml and add the line :
Code Block
<transformationconnector name="Datafari HTML Extractor Connector" class="com.francelabs.datafari.htmlextractor.HtmlExtractor"/>

...

  • name
  • title
  • keywords
  • description
  • author
  • dc_terms_subject
  • dc_terms_title
  • dc_terms_creator
  • dc_terms_description
  • dc_terms_publisher
  • dc_terms_contributor
  • dc_terms_date
  • dc_terms_type
  • dc_terms_format
  • dc_terms_languague
  • dc_terms_identifier

It concerns the metadata found in the <meta name="xx"> tag. So if you have in your document <meta name="keywords" content="keyword1,keyword2"> the metadata extracted has this name : jsoup_keywords. Or if you have the tag : <meta name="dcterms.creator" content="TEST" /> the metadata extracted is jsoup_dcterms_creator

...

Info

If you use Datafari, you can directly add on the MCF job a MetadataAdjuster transformation connector at the end of your pipeline and enter the name of the field in Datafari that you want to store the metadata into.

For example if you have <meta name="keywords" content="keyword1,keyword2"> in your web page, simply add this in the metadata adjuster tab : 

parameter name : keywords

expression : ${jsoup_keywords}

Image Modified



Info

Only for Datafari 5.0

...

  • Add the JAR datafari-html-extractor-connector-4.0.2-SNAPSHOT in the connector-lib folder of MCF ie $MCF/connector-lib
  • Edit the file $MC/mcf_home/connectors.xml and add the line :
Code Block
<transformationconnector name="Datafari HTML Extractor Connector" class="com.francelabs.datafari.htmlextractor.HtmlExtractor"/>

...

Info

If you use Datafari, you can directly add on the MCF job a MetadataAdjuster transformation connector at the end of your pipeline and enter the name of the field in Datafari that you want to store the metadata into (of course the Solr field must be declared into Solr first).

For example if you have <meta name="keywords" content="keyword1,keyword2"> in your web page, simply add this in the metadata adjuster tab : 

parameter name : keywords

expression : ${jsoup_keywords}

Image Modified


...


Info

Starting from Datafari 4

...

  • Add the JAR datafari-html-extractor-connector-4.0.2-SNAPSHOT in the connector-lib folder of MCF ie $MCF/connector-lib
  • Edit the file $MC/mcf_home/connectors.xml and add the line :
Code Block
<transformationconnector name="Datafari HTML Extractor Connector" class="com.francelabs.datafari.htmlextractor.HtmlExtractor"/>

...