lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fred Toth <ft...@synernet.com>
Subject Re: Faster highlighting with TermPositionVectors
Date Fri, 29 Oct 2004 04:02:01 GMT
Hi,

We are very interested in highlighting, but haven't gotten around
to reviewing the state of the highlighting mechanisms.

Could someone possibly give me the "big picture" on highlighting?
What code is available?
How does it work?
What are the current issues?

Many thanks,

Fred

At 07:16 PM 10/28/2004, you wrote:
>Thanks to the recent changes (see CVS) in TermFreqVector support we can 
>now make use of term offset information held
>in the Lucene index rather than incurring the cost of re-analyzing text to 
>highlight it.
>
>I have created a  class ( see 
>http://www.inperspective.com/lucene/TokenSources.java ) which handles creating
>a TokenStream from the TermPositionVector stored in the database which can 
>then be passed to the highlighter.
>This approach is significantly faster than re-parsing the original text.
>If people are happy with this class I'll add it to the Highlighter sandbox 
>but it may sit better elsewhere in the Lucene code base
>as a more general purpose utility.
>
>BTW as part of putting this together I found that the TermFreq code throws 
>a null pointer when indexing fields
>that produce no tokens (ie empty or all stopwords). Otherwise things work 
>very well.
>
>
>Cheers
>Mark
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message