jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Text extractor for MS Office 2007 documents
Date Fri, 13 Mar 2009 09:24:41 GMT

On Fri, Mar 13, 2009 at 9:55 AM, steven <steven.yong@e-chambers.net> wrote:
> Wonder if the text extracting support of MS Office 2007 documents is
> available already?

Not currently, but there's a patch for that in
https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also
have an experimental generic text extractor component based on Apache
Tika that can extract text from a wide range of document formats,
including Office 2007. Both depend on the 3.5 beta releases from
Apache POI.

We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and
target for replacing our custom text extractors with Apache Tika in
Jackrabbit 2.0.


Jukka Zitting

View raw message