lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Patramanskij <>
Subject Re[2]: Faster highlighting with TermPositionVectors (update)
Date Thu, 11 Nov 2004 11:55:00 GMT
Hello Mark.

I'm just wondered about the following piece of code from your latest
TokenSources class:

 public static TokenStream getAnyTokenStream(IndexReader reader,int docId, String field,Analyzer
analyzer) throws IOException
                TokenStream ts=null;

                TermFreqVector tfv=(TermFreqVector) reader.getTermFreqVector(docId,field);
                    if(tfv instanceof TermPositionVector)
                        //read pre-parsed token position info stored on disk
                        TermPositionVector tpv=(TermPositionVector) reader.getTermFreqVector(docId,field);
                //No token info stored so fall back to analyzing raw content
                return ts;

Isn't you called getTermFreqVector(docId,field) twice?

 Why not just call:

                    if(tfv instanceof TermPositionVector)
                       ts=getTokenStream((TermPositionVector) tvf);


Friday, November 5, 2004, 12:25:13 AM, you wrote:

m> Having revisited the original TokenSources code it looks like one of the 
m> optimisations I put in will fail if fields are stored with 
m> non-contiguous position info (ie the analyzer has messed with token 
m> position numbers so they overlap or have gaps like ..3,3,7,8,9,..).
m> I've now made the TokenSources code safe by default by assuming token 
m> position values are not contiguous and should not be used for sorting.
m> For those who know what they are doing  I have added a parameter to one 
m> of the methods to turn the optimisation back on if they can guarantee 
m> positions are contigous.

m> New code is at the same place:

m> Cheers
m> Mark

m> ---------------------------------------------------------------------
m> To unsubscribe, e-mail:
m> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message