lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Performance issues with ConjunctionScorer
Date Tue, 22 Nov 2005 16:26:17 GMT
The Highlighter in the lucene "contrib" section has a
class called TokenSources which tries to find the best
way of getting a TokenStream.
It can build a TokenStream from either:
a) an Analyzer
b) TermPositionVector (if the field was created with
one in the index)

You may find that using TermPositionVectors in your
index gives you a speed up but it all depends on the
cost of processing done by your analyzer. Using
TermPositionVector incurs extra data reads to get the
list of tokens from disk whereas using Analyzer is
extra CPU load processing the document text you've
already read from disk.
Both approaches typically need to read the original
document text when highlighting in order to retain the
stop words that make it readable. 
I have noticed before now that the StandardAnalyzer
was quite slow but other Analyzers are much quicker so
it can really depend on your choice.


To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message