lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: share some numbers for range queries
Date Mon, 16 Nov 2009 09:07:57 GMT
From: Jake Mannix [mailto:jake.mannix@gmail.com]
> On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> 
> > the second approach is slower, when deleted docs
> > are involved and 0 is inside the range (need to consult TermDocs).
> >
> 
> This is a good point (and should be mentioned in your blog, John) - for
> while
> custom FieldCache-like implementations (ie bobo-browse, which additionally
> isn't
> restricted to single-valued fields) need not have this deficiency, for
> they
> can choose to map
> empty values to MAX_INT or something like that, the FCRF in its raw form
> can really bite you, performance-wise, if you didn't notice that sometimes
> your queries ran across zero and there were lots of deletes.

I think both possibilities have their right to stay: FCRF is very, very
fast, if you have no deletions and 0 is not included in your range and you
have exactly one value per document (for the Lucene defaults). If you have
zero value documents, you have to index marker values instead of leaving the
field empty (e.g. Float.NaN for floats, which never hit a range). As soon as
you use other FieldCache impls using multi-doc values (like bobo), I think,
that it will not get really faster than NumericRangeQuery (additional work
to iterate over terms for each doc, more comparisons,...), it may get
slower. Not to forget is the possibly large overhead of populating the
FieldCache, if you do no sorting.

For easy use, NumericRangeQuery is preferable for all users, even if it is a
little bit slower, it is only faster for very optimized cases (no
deletions,...). Anybody should think about the pros and cons and not only
look on performance.

You can improve the speed of FieldCache populating, too. If you also use
NumericField instead (with precStep=MAX_VALUE), the parsing is faster:
Integer.parseInt() which is more complex than
NumericUtils.prefixCodedToInt().

For string term ranges, FieldCacheRangeFilter is much more faster, because
it uses StringIndex cache.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message