lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Ritchie <br...@jivesoftware.com>
Subject Re: Performance of hit highlighting and finding term positions for a specific document
Date Wed, 31 Mar 2004 02:36:54 GMT
Kevin A. Burton wrote:

> I'm playing with this package:
> 
> http://home.clara.net/markharwood/lucene/highlight.htm
> 
> Trying to do hit highlighting.  This implementation uses another 
> Analyzer to find the positions for the result terms.
> This seems that it's very inefficient since lucene already knows the 
> frequency and position of given terms in the index.
> 
> My question is whether it's hard to find a TermPosition for a given term 
> in a given document rather than the whole index.
> 
> IndexReader.termPositions( Term term ) is term specific not term and 
> document specific.

As far as I know it's not currently possible to get this information from a standard lucene
index.

> Also it seems that after all this time that Lucene should have efficient 
> hit highlighting as a standard package.  Is there any interest in seeing 
> a contribution in the sandbox for this if it uses the index positions?

I've been meaning to look into good ways to store token offset information to allow for very

efficient highlighting and I believe Mark may also be looking into improving the highlighter
via 
other means such as temporary ram indexes. Search the archives to get a background on some
of the 
idea's we've tossed around ('Dmitry's Term Vector stuff, plus some' and 'Demoting results'
come to 
mind as threads that touch this topic).


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Mime
View raw message