lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Faster highlighting with TermPositionVectors
Date Thu, 04 Nov 2004 10:13:48 GMT
Mark,

This is great stuff!

One quick comment just at my look at the code (I haven't tried it yet). 
  Shouldn't the tpv variable be used in this method?

     public static TokenStream getAnyTokenStream(IndexReader reader,int 
docId, String field,Analyzer analyzer) throws IOException
     {
		TokenStream ts=null;

		TermFreqVector tfv=(TermFreqVector) 
reader.getTermFreqVector(docId,field);
		if(tfv!=null)
		{
		    if(tfv instanceof TermPositionVector)
		    {
		        //the most efficient choice..
 >>>		        TermPositionVector tpv=(TermPositionVector) 
reader.getTermFreqVector(docId,field);
		        ts=getTokenStream(reader,docId,field);
		    }
		}
		//No token info stored so fall back to analyzing raw content
		if(ts==null)
		{
		    ts=getTokenStream(reader,docId,field,analyzer);
		}
		return ts;
     }

Erik


On Oct 28, 2004, at 7:16 PM, markharw00d@yahoo.co.uk wrote:

> Thanks to the recent changes (see CVS) in TermFreqVector support we 
> can now make use of term offset information held
> in the Lucene index rather than incurring the cost of re-analyzing 
> text to highlight it.
>
> I have created a  class ( see 
> http://www.inperspective.com/lucene/TokenSources.java ) which handles 
> creating
> a TokenStream from the TermPositionVector stored in the database which 
> can then be passed to the highlighter.
> This approach is significantly faster than re-parsing the original 
> text.
> If people are happy with this class I'll add it to the Highlighter 
> sandbox but it may sit better elsewhere in the Lucene code base
> as a more general purpose utility.
>
> BTW as part of putting this together I found that the TermFreq code 
> throws a null pointer when indexing fields
> that produce no tokens (ie empty or all stopwords). Otherwise things 
> work very well.
>
>
> Cheers
> Mark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message