jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Is jackrabbit textFilterClasses able to handle office 2007 documents.
Date Sat, 14 Feb 2009 12:22:12 GMT

On Sat, Feb 14, 2009 at 12:59 PM, Akil Ali <Akil.Ajani@cognizant.com> wrote:
> i can see that there are numbers of filters available in the latest version.
> But will it be able to extract the contents of office 2007 documents. is
> anyone tested with indexing contents of office 2007 documents.

See JCR-1887 [1] for a patch that adds support for indexing Office
2007 documents.

Alternatively, the latest trunk of Apache Tika [2] also supports
Office 2007, and you can the jackrabbit-tika sandbox component [3]
allows you to set up Tika as a text extractor in Jackrabbit.

We will most likely have Office 2007 support built in when Jackrabbit
1.6 is released.

[1] https://issues.apache.org/jira/browse/JCR-1887
[2] http://lucene.apache.org/tika/
[3] http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-tika/


Jukka Zitting

View raw message