File Type Detection

Valid from Datafari X.X

File type detection is performed in the DatafaraiUpdateProcessor class, near the end of the processAdd function.

Extension is identified using two different methods:

  1. Getting it from the file path using the Apache common io FilenameUtils class

  2. Getting it from the mime types extracted by Tika, the last mime type that is not of type binary is retained as the final extension for the file

Which of this information prevails on the other depends on the configuration made for the update processor (configuration described here: Detecting file type via its file extension). In any case, is the preferred methods does not provide a relevant value for the extension (extension text must be 2 to 5 characters long), the other is used.