lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 17:44:43 GMT

: why not just discard them completely in say, indexer/queryparser ?

In QueryParser: maybe, that's a high level API with assumptions about 
"human" interaction and text.

In the IndexWriter: it seems like a bad idea.

Low level Lucene really shouldn't be making any assumptions about *how* 
the client code is using the library -- you and i may not have any good 
reasons for wanting an empty term, but we shouldn't put that as a 
hardcoded assumption in the low level code.

It's essentially the converse issue of IndexWriter.maxFieldLength -- 
which was deliberately changed to default to Integer.MAX_VALUE precisesly 
because of this "don't assume we know how people are using the library" 
issue -- but we could certianly make it configurable in the same way.

(I see now that IndexWriter.maxFieldLength got deprecated in favor of 
IndexWriterConfig.maxFieldLength ... i thought i remembered that had been 
deprecated in favor of a TokenFilter that did the limiting, hence my 
suggestion that we use the same pattern for "min term length" -- it 
could easily be an IndexWriterConfig option as well, but using the 
TokenFilter approach seems more useful since it can be per field)


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message