lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <>
Subject Re: search performance
Date Tue, 03 Jun 2014 10:29:06 GMT

What if you were to forget for a moment the whole pagination idea, and
always capped your search at 1000 results for testing purposes only? This
is just to try and pinpoint the bottleneck here; if, regardless of the
query parameters, the search latency stays roughly the same and well below
5 min, you now have the answer - the problem is your naive implementation
of pagination which results in snowballing result numbers and search times,
the closer you get to the end of the results range. Otherwise, I would
focus on your query and filter next.

On Tue, Jun 3, 2014 at 3:21 AM, Jamie <> wrote:

> Vitaly
> See below:
> On 2014/06/03, 12:09 PM, Vitaly Funstein wrote:
>> A couple of questions.
>> 1. What are you trying to achieve by setting the current thread's priority
>> to max possible value? Is it grabbing as much CPU time as possible? In my
>> experience, mucking with thread priorities like this is at best futile,
>> and
>> at worst quite detrimental to responsiveness and overall performance of
>> the
>> system as a whole. I would remove that line.
> Yes,  you are right to be worried about this, especially since thread
> priorities behave differently on different platforms.
>> 2. This seems suspicious:
>> if (getPagination()) {
>>                  max = start + length;
>>              } else {
>>                  max = getMaxResults();
>>              }
>> If start is at 100M, and length is 1000 - what do you think Lucene will
>> try
>> and do when you pass this max to the collector?
> I dont see the problem here. The collector will start from zero to max
> results. I agree that from a performance perspective, ts not ideal to
> return all results from the beginning of the search, but the Lucene API us
> with no choice. I simply do not know the ScoreDoc to start from. If I did
> keep a record of it, then I would need to store all scoredocs for the
> entire result set. When there are 60M+ results, this can be problematic in
> terms of memory consumption. It would be far nicer if there was a
> searchAfter function that took a position as an integer.
> Regards
> Jamie
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message