lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 17:53:13 GMT
We don't even need the TF, LengthFilter does the job very easy, just set
minTermLength to 1 and maxTermLength to Integer.MAX_VALUE.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, October 29, 2010 7:49 PM
> To: dev@lucene.apache.org
> Subject: RE: inconsistency/performance trap of empty terms
> 
> I am for the tokenfilter approach. Max Field Length is still to be
deprecated in
> favour of the TokenFilter.
> 
> TF is very easy, just loop over incrementToken() until it returns false or
a
> termLength>0
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > Sent: Friday, October 29, 2010 7:45 PM
> > To: Lucene Dev
> > Subject: Re: inconsistency/performance trap of empty terms
> >
> >
> > : why not just discard them completely in say, indexer/queryparser ?
> >
> > In QueryParser: maybe, that's a high level API with assumptions about
> "human"
> > interaction and text.
> >
> > In the IndexWriter: it seems like a bad idea.
> >
> > Low level Lucene really shouldn't be making any assumptions about
> > *how*
> the
> > client code is using the library -- you and i may not have any good
> reasons for
> > wanting an empty term, but we shouldn't put that as a hardcoded
> > assumption in the low level code.
> >
> > It's essentially the converse issue of IndexWriter.maxFieldLength --
> > which
> was
> > deliberately changed to default to Integer.MAX_VALUE precisesly
> > because of this "don't assume we know how people are using the library"
> > issue -- but we could certianly make it configurable in the same way.
> >
> > (I see now that IndexWriter.maxFieldLength got deprecated in favor of
> > IndexWriterConfig.maxFieldLength ... i thought i remembered that had
> > been deprecated in favor of a TokenFilter that did the limiting, hence
> > my
> suggestion
> > that we use the same pattern for "min term length" -- it could easily
> > be
> an
> > IndexWriterConfig option as well, but using the TokenFilter approach
> > seems more useful since it can be per field)
> >
> >
> > -Hoss
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message