lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: inconsistency/performance trap of empty terms
Date Sat, 30 Oct 2010 11:19:01 GMT
I'd say support them everywhere, and slip LengthFilter into all the
standard Analyzers, so people won't hit empty terms unless they opt-in
for it.
This is a most consistent approach.

On Sat, Oct 30, 2010 at 15:06, Robert Muir <> wrote:
> On Sat, Oct 30, 2010 at 7:01 AM, Earwin Burrfoot <> wrote:
>> Mathematically an inverted index is keyed by strings. Any strings.
>> Empty term is just a case of a string of length 0.
>> So, for consistency, Lucene should support them."")
>> should position you into very beginning of terms list, etc.
>> If you drop the support, you have to check zero length damn
>> eeeeverywhere in the API where you accept terms. Or, thoroughly
>> document unpredictable erratic behaviour :)
> well, we are checking this already, in a lot of the analyzers.
> as i said originally, the biggest problems that we *must* solve are:
> 1. try to prevent the performance trap i mentioned, where people
> create the empty term as a mega-stopword without realizing it.
> 2. fix the analyzers to be consistent with regards to the empty
> term... for example, if we decide the empty term is supported, then
> they shouldnt be arbitrarily removing empty-term tokens.
> as far as TermsEnum, i myself have already had to special-case the
> empty term in TermsEnum implementations before... and I'm pretty
> fucking sure that we have long-standing bugs if you have an empty-term
> anywhere in your index (e.g. FuzzyQuery will divide by 0 to scale the
> boost, and you will get a strange exception from your collector
> because it will then have NaN/Inf/some sentinel value).
> just saying, its problematic today, doing nothing and leaving it the
> messy unambiguous situation it is now is no option.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Kirill Zakharenko/Кирилл Захаренко (
Phone: +7 (495) 683-567-4
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message