manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Massiera (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CONNECTORS-1459) Tika service wrong Content-Type
Date Tue, 26 Sep 2017 06:19:00 GMT
Julien Massiera created CONNECTORS-1459:
-------------------------------------------

             Summary: Tika service wrong Content-Type
                 Key: CONNECTORS-1459
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1459
             Project: ManifoldCF
          Issue Type: Bug
          Components: Tika service connector
    Affects Versions: ManifoldCF 2.8.1
            Reporter: Julien Massiera
            Priority: Minor


I noticed that the standard behaviour of the Tika extractor connector is to replace the existing
"Content-Type" metadata by the one it founds. This behaviour is not implemented in the Tika
service connector which just adds a new metadata entry instead of replacing the existing one.
The consequence is that two values are available for the "Content-Type" metadata but only
the first one is kept by the connector (which can also be considered as a bug ? this is the
case for both the Tika extractor connector and the Tika service connector).
So depending on the source connector, the resulting "Content-Type" may be wrong if for example
the original provided one is "application/octet-stream"

I will provide a patch for this bug



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message