lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 17:50:33 GMT
On Fri, Oct 29, 2010 at 1:44 PM, Chris Hostetter
<> wrote:

> In the IndexWriter: it seems like a bad idea.

it doesn't have to be there. but we have to do something.
Currently we have the worst of both worlds:
1. the empty term causes performance problems for people who don't
realize whats going on (especially with their analyzers etc)
2. the empty term isnt reliable and doesnt work anyway, lots of
analysis components throw it away.

We should decide if we either:
A. support the empty term.
B. do not support the empty term.

I was suggesting that mathematically, the empty term makes no sense in
an inverted index, and we shouldn't allow it.
Its one solution.

The other solution is to fully support it everywhere... but we have to
decide, it can't be the ambiguous situation it is today.

> Low level Lucene really shouldn't be making any assumptions about *how*
> the client code is using the library -- you and i may not have any good
> reasons for wanting an empty term, but we shouldn't put that as a
> hardcoded assumption in the low level code.

I think we can make a safe assumption here, the empty term makes *no
sense* for an inverted index.
I could argue that every document has this!

> It's essentially the converse issue of IndexWriter.maxFieldLength --
> which was deliberately changed to default to Integer.MAX_VALUE precisesly
> because of this "don't assume we know how people are using the library"
> issue -- but we could certianly make it configurable in the same way.

No its not, because we are talking about the empty *term* not an empty *field*.
An empty term makes no sense.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message