jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Tika text extractor (Was: [jira] Commented: (JCR-1530) MsPowerPointTextExtractor does not extract from PPTs with € sign)
Date Sun, 13 Apr 2008 16:20:18 GMT

On Fri, Apr 11, 2008 at 12:20 PM, Marcel Reutegger (JIRA)
<jira@apache.org> wrote:
>  We might want to provide an adapter, which implements the Jackrabbit
> TextExtractor interface and uses Tika to extract the text. Users then can
> decide if they want to use it and therefore need to use Java 1.5.

I created a sandbox component called jackrabbit-tika that uses the
latest Tika 0.2 snapshot to implement the TextExtractor interface.


Jukka Zitting

View raw message