lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviran" <amo...@infosciences.com>
Subject RE: Faster highlighting with TermPositionVectors
Date Wed, 03 Nov 2004 21:10:47 GMT
Did anyone tried this class ?

I tried this class but I can't make it to work I indexed a field as new
Field("description", description,true,true,true,true); but when I call 
TokenSources.getTokenStream(_indexReader,i,"description"); I get
ClassCastException

In this class the line TermPositionVector tpv=(TermPositionVector)
reader.getTermFreqVector(docId,field); is trying to cast SegmentTermVector
to TermPositionVector. 
Is there anything I'm doing wrong. Should I have indexed the field some
other way to store TermPositionVector ?

BTW: I'm using the latest lucene source from CVS.

Thanks,
Aviran

-----Original Message-----
From: Bruce Ritchie [mailto:bruce@jivesoftware.com] 
Sent: Friday, October 29, 2004 1:15 AM
To: Lucene Users List
Subject: RE: Faster highlighting with TermPositionVectors


Mark,

> Thanks to the recent changes (see CVS) in TermFreqVector
> support we can now make use of term offset information held 
> in the Lucene index rather than incurring the cost of 
> re-analyzing text to highlight it.
> 
> I have created a  class ( see
> http://www.inperspective.com/lucene/TokenSources.java ) which 
> handles creating a TokenStream from the TermPositionVector 
> stored in the database which can then be passed to the highlighter.
> This approach is significantly faster than re-parsing the 
> original text.
> If people are happy with this class I'll add it to the 
> Highlighter sandbox but it may sit better elsewhere in the 
> Lucene code base as a more general purpose utility.
> 
> BTW as part of putting this together I found that the
> TermFreq code throws a null pointer when indexing fields that 
> produce no tokens (ie empty or all stopwords). Otherwise 
> things work very well.

This is great news! While I won't have the time to test this until probably
mid November I do look forward to the speed improvements as the current
highlighting mechanisms (reparsing the text) was just not performant enough
under heavy loads.


Regards,

Bruce Ritchie
http://www.jivesoftware.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message