jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Klimetschek" <aklim...@day.com>
Subject Re: Text Extractor issue?
Date Thu, 17 Jul 2008 10:57:58 GMT

On Wed, Jul 16, 2008 at 3:14 PM, hsp_ <piccinatto@ibest.com.br> wrote:
> I have added a huge amount of files in the repository, some of them with the
> ".sxw" extension and recognized (by sun.net.www.MimeTable and, if without
> sucess, after by a table of mimetypes in my application) like
> "application/vnd.sun.xml.writer" and the jcr:mimetype was with this value.
> Nowadays, I tried to search some documents .sxw by content and they not
> returned. So, I saw that in the class OpenOfficeTextExtractor that only the
> mimetypes :"application/vnd.oasis.opendocument.database",
> "application/vnd.oasis.opendocument.formula",
> "application/vnd.oasis.opendocument.graphics",
> "application/vnd.oasis.opendocument.presentation",
> "application/vnd.oasis.opendocument.spreadsheet",
> "application/vnd.oasis.opendocument.text"
> would be recognized and indexed by the extractor, is it true?
> This means that my application must force the mimetype for some in this
> list, in the case of extensions that have another mimetype? Is the class
> able to index such kind of openoffice format?
> What the solution for my case?

If the documents with the "application/vnd.sun.xml.writer" can be
properly read with the OpenOfficeTextExtractor, we could add them to
the list of supported mime-types for that extractor. Could you test
that by patching the OOTExtractor and overwriting the old one in your
classpath? If that works out, you can submit the patch to JIRA.

> (I am thinking about to update the jcr:mimetype to the
> "application/vnd.oasis.opendocument.text" value and redo the indexes, this
> would resolve the case by the moment?)

Yes, this should work, too.


Alexander Klimetschek

View raw message