lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 17:44:43 GMT

: why not just discard them completely in say, indexer/queryparser ?

In QueryParser: maybe, that's a high level API with assumptions about 
"human" interaction and text.

In the IndexWriter: it seems like a bad idea.

Low level Lucene really shouldn't be making any assumptions about *how* 
the client code is using the library -- you and i may not have any good 
reasons for wanting an empty term, but we shouldn't put that as a 
hardcoded assumption in the low level code.

It's essentially the converse issue of IndexWriter.maxFieldLength -- 
which was deliberately changed to default to Integer.MAX_VALUE precisesly 
because of this "don't assume we know how people are using the library" 
issue -- but we could certianly make it configurable in the same way.

(I see now that IndexWriter.maxFieldLength got deprecated in favor of 
IndexWriterConfig.maxFieldLength ... i thought i remembered that had been 
deprecated in favor of a TokenFilter that did the limiting, hence my 
suggestion that we use the same pattern for "min term length" -- it 
could easily be an IndexWriterConfig option as well, but using the 
TokenFilter approach seems more useful since it can be per field)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message