lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 21:21:54 GMT
OK, I think this will improve the situation:
https://issues.apache.org/jira/browse/LUCENE-1596

-Yonik
http://www.lucidimagination.com


On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> We never fully explained it, but we have some ideas...
>
> It's only if you iterate each term, and do a TermDocs.seek for each,
> that Multi*Reader seems to show the problem.  Just iterating the terms
> seems OK (I have a 51 segment index, and I can iterate ~ 10M unique
> terms in ~8 seconds).
>
> But loading FieldCache, or doing eg RangeQuery, also does a
> MultiTermDocs.seek on each term, which in turn calls
> SegmentTermDocs.seek for each of the sub-readers in sequence.  I
> *think* maybe for highly unique terms, where typically all segments
> but one actually have the term, the cost of invoking seek on those
> segments without the term is high.  Really, somehow, we want to only
> call seek on those segments that have the term, which we know from the
> pqueue...
>
> Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message