jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: How can I access to the TextExtractor result?
Date Tue, 24 Nov 2009 16:50:02 GMT

On Tue, Nov 24, 2009 at 5:37 PM, Paco Avila <monkiki@gmail.com> wrote:
> I wonder if I can access the text produced by the TextExtractor from a
> document file (like a PDF, for example)

Jackrabbit doesn't store the extracted text anywhere, it is just used
to add the document to the inverted Lucene index.

You can always use the text extractor directly to get the text
content. Check out http://lucene.apache.org/tika/ for more details
about the Tika toolkit that we nowadays use for text extraction.


Jukka Zitting

View raw message