lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 00:35:40 GMT
In database queries, it is often useful to treat an empty value specially, and be able to search
explicitly for records that have (for instance) no field X, or no value for field X.  I can't
regurgitate offhand all the precise situations that I've used this and claim that they would
apply to a search engine, but it is conceivable that it could be helpful to somebody.  Would
your proposed change preclude current or future support for such null queries?


-----Original Message-----
From: ext Robert Muir [] 
Sent: Thursday, October 28, 2010 8:21 PM
Subject: Re: inconsistency/performance trap of empty terms

On Thu, Oct 28, 2010 at 7:59 PM, Chris Hostetter
<> wrote:
> : Anyway, I think its possible other users might be in this same
> : situation, with slow performance, and not even realizing it yet...
> : Obviously they can fix this if they go and add LengthFilter, but
> : should we be doing something different?
> On one level,  ithink a big improvement might just be to start encouraging
> more use of LengthFilter with min=1 at the end of analyzers by including
> it at the end of more "example" field types -- we should probably end
> every analyzer with that and RemoveDuplicatesTokenFilterFactory as a
> general pattern.

why not just discard them completely in say, indexer/queryparser ?

> How individual Tokenizers and TokenFilters deal with empty tokens seems
> like something that should be cases by case -- the Ngram classes should
> allow/create them if the "min" value is 0, the pattern based classes
> should create them if the pattern matches and empty string, etc....

why should they create them? is there some use case for the empty term
that you have found (because i can't think of a use case, except
making your search engine slower!)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message