lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Faster highlighting with TermPositionVectors
Date Thu, 28 Oct 2004 23:16:11 GMT
Thanks to the recent changes (see CVS) in TermFreqVector support we can now make use of term
offset information held 
in the Lucene index rather than incurring the cost of re-analyzing text to highlight it.

I have created a  class ( see http://www.inperspective.com/lucene/TokenSources.java ) which
handles creating
a TokenStream from the TermPositionVector stored in the database which can then be passed
to the highlighter.
This approach is significantly faster than re-parsing the original text.
If people are happy with this class I'll add it to the Highlighter sandbox but it may sit
better elsewhere in the Lucene code base
as a more general purpose utility.

BTW as part of putting this together I found that the TermFreq code throws a null pointer
when indexing fields
that produce no tokens (ie empty or all stopwords). Otherwise things work very well.


Cheers
Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message