tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Ott (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-697) Tika reports the content type of AR archives as "text/plain"
Date Mon, 07 Nov 2011 09:53:52 GMT

    [ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145308#comment-13145308
] 

Alex Ott commented on TIKA-697:
-------------------------------

I think, that following magic in tika-mimetypes.xml will be enough (instead of modifying code
of Tika):

  <mime-type type="application/x-unix-archive">
    <magic priority="50">
      <match value="0x213C617263683E0A" type="string" offset="0" />
    </magic>
    <glob pattern="*.a"/>
  </mime-type>

                
> Tika reports the content type of AR archives as "text/plain"
> ------------------------------------------------------------
>
>                 Key: TIKA-697
>                 URL: https://issues.apache.org/jira/browse/TIKA-697
>             Project: Tika
>          Issue Type: Bug
>         Environment: Linux (CentOS 5.6)
>            Reporter: PNS
>            Priority: Trivial
>
> The Tika.detect(InputStream) method returns "text/plain" for AR archives created with
the Linux "Create Archive" option of Nautilus (available via right-clicking on a file).
> The Apache Commons Compress "autodetection" code of the ArchiveStreamFactory looks at
the first 12 bytes of the stream and correctly identifies the type as AR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message