jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-728) Automatic MIME type detection
Date Thu, 01 Feb 2007 18:51:05 GMT

    [ https://issues.apache.org/jira/browse/JCR-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469546
] 

Jukka Zitting commented on JCR-728:
-----------------------------------

I've looked at jmimemagic too, but as you mentioned, it's a bit limited. It's also licensed
under the LGPL, which makes it a bit troublesome for us.

There's a recent codebase at http://hedges.net/archives/2006/11/08/java-shared-mime-info/
that seems pretty good, but the code is under the GPL.

I recently discussed with some people form Apache Nutch about a project to implement the shared
mime info standard from freedesktop.org (http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec),
and apparently someone already had some Apache-licensed code for that but I haven't yet seen
it.

I've been planning to propose an implementation project for the mime info standard in Apache
Labs (http://labs.apache.org/), but if there's more interest within the Jackrabbit community
we could also start working on it within the jackrabbit-text-extractors component.

> Automatic MIME type detection
> -----------------------------
>
>                 Key: JCR-728
>                 URL: https://issues.apache.org/jira/browse/JCR-728
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> Currently only the jcr:mimeType property is used to determine the MIME type and thus
the applicable text extractor to use for indexing a document. If the jcr:mimeType property
is not available or is set to a generic value like "application/octet-stream", then the indexer
could also use some heuristics based on the node name or magic numbers within the binary stream
to determine the type of the document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message