lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: inconsistency/performance trap of empty terms
Date Sat, 30 Oct 2010 11:01:33 GMT
On Fri, Oct 29, 2010 at 21:50, Robert Muir <rcmuir@gmail.com> wrote:
> I was suggesting that mathematically, the empty term makes no sense in
> an inverted index, and we shouldn't allow it.
> Its one solution.

Mathematically an inverted index is keyed by strings. Any strings.
Empty term is just a case of a string of length 0.
So, for consistency, Lucene should support them. TermsEnum.seek("")
should position you into very beginning of terms list, etc.
If you drop the support, you have to check zero length damn
eeeeverywhere in the API where you accept terms. Or, thoroughly
document unpredictable erratic behaviour :)

A possible usecase for empty terms in analyzer stream is slipping in
various metadata. Paragraph/sentence delimiters, whatever. Nobody
precludes you from using "##PAR#BEGIN##" kind of things, but you may
want to leave term text alone and exploit other attributes.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message