lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: inconsistency/performance trap of empty terms
Date Thu, 28 Oct 2010 23:59:27 GMT

: Anyway, I think its possible other users might be in this same
: situation, with slow performance, and not even realizing it yet...
: Obviously they can fix this if they go and add LengthFilter, but
: should we be doing something different?

On one level,  ithink a big improvement might just be to start encouraging 
more use of LengthFilter with min=1 at the end of analyzers by including 
it at the end of more "example" field types -- we should probably end 
every analyzer with that and RemoveDuplicatesTokenFilterFactory as a 
general pattern.

How individual Tokenizers and TokenFilters deal with empty tokens seems 
like something that should be cases by case -- the Ngram classes should 
allow/create them if the "min" value is 0, the pattern based classes 
should create them if the pattern matches and empty string, etc....

If individual classes can be made drasticly more efficient by ignoring 
them (or specificly: by not needing to have branches based on explict 
checks for an empty term) then we can definitley offer both 
implementations and document carefully their valid usage.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message