jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From steven <steven.y...@e-chambers.net>
Subject Re: Text extractor for MS Office 2007 documents
Date Fri, 13 Mar 2009 09:33:00 GMT
Jukka Zitting wrote:
> Hi,
> 
> On Fri, Mar 13, 2009 at 9:55 AM, steven <steven.yong@e-chambers.net> wrote:
>> Wonder if the text extracting support of MS Office 2007 documents is
>> available already?
> 
> Not currently, but there's a patch for that in
> https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also
> have an experimental generic text extractor component based on Apache
> Tika that can extract text from a wide range of document formats,
> including Office 2007. Both depend on the 3.5 beta releases from
> Apache POI.
> 
> We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and
> target for replacing our custom text extractors with Apache Tika in
> Jackrabbit 2.0.
> 
I see, thanks for the clarification.

	steven


Mime
View raw message