lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 20:41:38 GMT
Reading the discussion here...  I think Roberts main point is that
lucene lets you make empty terms, but has inconsistent behavior with
them.  We should either remove support, or make consistent behavior.

Sure, there are lots of options to avoid the problems.  But should it
be necessary to work to avoid them?


On Fri, Oct 29, 2010 at 1:53 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> We don't even need the TF, LengthFilter does the job very easy, just set
> minTermLength to 1 and maxTermLength to Integer.MAX_VALUE.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Friday, October 29, 2010 7:49 PM
>> To: dev@lucene.apache.org
>> Subject: RE: inconsistency/performance trap of empty terms
>>
>> I am for the tokenfilter approach. Max Field Length is still to be
> deprecated in
>> favour of the TokenFilter.
>>
>> TF is very easy, just loop over incrementToken() until it returns false or
> a
>> termLength>0
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
>> > Sent: Friday, October 29, 2010 7:45 PM
>> > To: Lucene Dev
>> > Subject: Re: inconsistency/performance trap of empty terms
>> >
>> >
>> > : why not just discard them completely in say, indexer/queryparser ?
>> >
>> > In QueryParser: maybe, that's a high level API with assumptions about
>> "human"
>> > interaction and text.
>> >
>> > In the IndexWriter: it seems like a bad idea.
>> >
>> > Low level Lucene really shouldn't be making any assumptions about
>> > *how*
>> the
>> > client code is using the library -- you and i may not have any good
>> reasons for
>> > wanting an empty term, but we shouldn't put that as a hardcoded
>> > assumption in the low level code.
>> >
>> > It's essentially the converse issue of IndexWriter.maxFieldLength --
>> > which
>> was
>> > deliberately changed to default to Integer.MAX_VALUE precisesly
>> > because of this "don't assume we know how people are using the library"
>> > issue -- but we could certianly make it configurable in the same way.
>> >
>> > (I see now that IndexWriter.maxFieldLength got deprecated in favor of
>> > IndexWriterConfig.maxFieldLength ... i thought i remembered that had
>> > been deprecated in favor of a TokenFilter that did the limiting, hence
>> > my
>> suggestion
>> > that we use the same pattern for "min term length" -- it could easily
>> > be
>> an
>> > IndexWriterConfig option as well, but using the TokenFilter approach
>> > seems more useful since it can be per field)
>> >
>> >
>> > -Hoss
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message