jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-728) Automatic MIME type detection
Date Thu, 01 Feb 2007 18:51:05 GMT

    [ https://issues.apache.org/jira/browse/JCR-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469546

Jukka Zitting commented on JCR-728:

I've looked at jmimemagic too, but as you mentioned, it's a bit limited. It's also licensed
under the LGPL, which makes it a bit troublesome for us.

There's a recent codebase at http://hedges.net/archives/2006/11/08/java-shared-mime-info/
that seems pretty good, but the code is under the GPL.

I recently discussed with some people form Apache Nutch about a project to implement the shared
mime info standard from freedesktop.org (http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec),
and apparently someone already had some Apache-licensed code for that but I haven't yet seen

I've been planning to propose an implementation project for the mime info standard in Apache
Labs (http://labs.apache.org/), but if there's more interest within the Jackrabbit community
we could also start working on it within the jackrabbit-text-extractors component.

> Automatic MIME type detection
> -----------------------------
>                 Key: JCR-728
>                 URL: https://issues.apache.org/jira/browse/JCR-728
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Jukka Zitting
>            Priority: Minor
> Currently only the jcr:mimeType property is used to determine the MIME type and thus
the applicable text extractor to use for indexing a document. If the jcr:mimeType property
is not available or is set to a generic value like "application/octet-stream", then the indexer
could also use some heuristics based on the node name or magic numbers within the binary stream
to determine the type of the document.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message