lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Ritchie" <br...@jivesoftware.com>
Subject RE: Faster highlighting with TermPositionVectors
Date Fri, 29 Oct 2004 05:14:33 GMT
Mark,

> Thanks to the recent changes (see CVS) in TermFreqVector 
> support we can now make use of term offset information held 
> in the Lucene index rather than incurring the cost of 
> re-analyzing text to highlight it.
> 
> I have created a  class ( see 
> http://www.inperspective.com/lucene/TokenSources.java ) which 
> handles creating a TokenStream from the TermPositionVector 
> stored in the database which can then be passed to the highlighter.
> This approach is significantly faster than re-parsing the 
> original text.
> If people are happy with this class I'll add it to the 
> Highlighter sandbox but it may sit better elsewhere in the 
> Lucene code base as a more general purpose utility.
> 
> BTW as part of putting this together I found that the 
> TermFreq code throws a null pointer when indexing fields that 
> produce no tokens (ie empty or all stopwords). Otherwise 
> things work very well.

This is great news! While I won't have the time to test this until probably mid November I
do look forward to the speed improvements as the current highlighting mechanisms (reparsing
the text) was just not performant enough under heavy loads.


Regards,

Bruce Ritchie
http://www.jivesoftware.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message