jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: TermVectors from Jackrabbit Queries
Date Wed, 16 Dec 2009 14:21:19 GMT

On 16 Dec 2009, at 10:25, Jukka Zitting wrote:

> Hi,
> 
> On Tue, Dec 15, 2009 at 6:11 PM, Ian Boston <ieb@tfd.co.uk> wrote:
>> Is there any other way of getting to the SearchIndex, so that I can get?
>> to the Lucene Document and the TermVector (other than AspectJ or cglib)
> 
> Instead of reaching down to the underlying Lucene index, I would
> recommend reading the original document data stored in the JCR node
> and passing it through the Jackrabbit text extractors and the
> configured Lucene Analyzer to get the terms stored in the index.


That can be quite expensive, especially for poor quality PDF,s, and some docx word docs.
I am expecting to want to do this for between 25 and 100 nodes at a time aggregating the results.

Ian

> 
> BR,
> 
> Jukka Zitting


Mime
View raw message