lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 17:49:15 GMT
I am for the tokenfilter approach. Max Field Length is still to be
deprecated in favour of the TokenFilter.

TF is very easy, just loop over incrementToken() until it returns false or a
termLength>0

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Friday, October 29, 2010 7:45 PM
> To: Lucene Dev
> Subject: Re: inconsistency/performance trap of empty terms
> 
> 
> : why not just discard them completely in say, indexer/queryparser ?
> 
> In QueryParser: maybe, that's a high level API with assumptions about
"human"
> interaction and text.
> 
> In the IndexWriter: it seems like a bad idea.
> 
> Low level Lucene really shouldn't be making any assumptions about *how*
the
> client code is using the library -- you and i may not have any good
reasons for
> wanting an empty term, but we shouldn't put that as a hardcoded assumption
> in the low level code.
> 
> It's essentially the converse issue of IndexWriter.maxFieldLength -- which
was
> deliberately changed to default to Integer.MAX_VALUE precisesly because of
> this "don't assume we know how people are using the library"
> issue -- but we could certianly make it configurable in the same way.
> 
> (I see now that IndexWriter.maxFieldLength got deprecated in favor of
> IndexWriterConfig.maxFieldLength ... i thought i remembered that had been
> deprecated in favor of a TokenFilter that did the limiting, hence my
suggestion
> that we use the same pattern for "min term length" -- it could easily be
an
> IndexWriterConfig option as well, but using the TokenFilter approach seems
> more useful since it can be per field)
> 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message