lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix (JIRA)" <>
Subject [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Date Wed, 28 Oct 2009 14:26:59 GMT


Jake Mannix commented on LUCENE-1997:

bq. search segment 2 into queue B (with possible short circuiting by the smallest value in

Well, we're not doing the short circuit trick on multiPQ right now, are we?  It would certainly
speed things up, but requires the API have the convert() method available, which was the big
savings on the API side to multiPQ.  If it was available, I think multiPQ (either with N or
2 queues) would perform *strictly* better than singlePQ, but I didn't suggest this because
it seems to negate the cleanliness of the API.

One thing John mentioned offhand is that perhaps the convert() method could be optional? 
If you don't implement it, you don't get to short-circuit using knowledge of previous segments,
but if you do, you get maximum performance in the cases where multiPQ performs worse (mid-range
hitCount, high numResultsToReturn, and in the numeric sorting case).

I think maybe combining this idea with 2 queues could be the best of all worlds, with best
overall speed, only twice the memory of singlePQ, and the simplest API with the addition of
one new *optional* method?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>                 Key: LUCENE-1997
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch,
LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message