lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Date Fri, 23 Oct 2009 16:34:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769287#action_12769287
] 

Michael McCandless commented on LUCENE-1997:
--------------------------------------------

Env:

JAVA:
java version "1.5.0_19"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-b02, mixed mode)


OS:
SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris


Results:

||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|wiki|log|1|318481|title|10|98.47|104.60|{color:green}6.2%{color}|
|wiki|log|1|318481|title|25|97.90|103.63|{color:green}5.9%{color}|
|wiki|log|1|318481|title|50|105.12|101.50|{color:red}-3.4%{color}|
|wiki|log|1|318481|title|100|102.30|108.59|{color:green}6.1%{color}|
|wiki|log|1|318481|title|500|89.43|79.40|{color:red}-11.2%{color}|
|wiki|log|1|318481|title|1000|82.83|63.75|{color:red}-23.0%{color}|
|wiki|log|<all>|1000000|title|10|152.56|157.40|{color:green}3.2%{color}|
|wiki|log|<all>|1000000|title|25|151.95|148.52|{color:red}-2.3%{color}|
|wiki|log|<all>|1000000|title|50|148.52|142.90|{color:red}-3.8%{color}|
|wiki|log|<all>|1000000|title|100|127.70|138.72|{color:green}8.6%{color}|
|wiki|log|<all>|1000000|title|500|104.30|90.30|{color:red}-13.4%{color}|
|wiki|log|<all>|1000000|title|1000|99.10|66.05|{color:red}-33.4%{color}|
|random|log|<all>|1000000|rand string|10|153.13|157.74|{color:green}3.0%{color}|
|random|log|<all>|1000000|rand string|25|128.79|150.62|{color:green}17.0%{color}|
|random|log|<all>|1000000|rand string|50|122.46|153.95|{color:green}25.7%{color}|
|random|log|<all>|1000000|rand string|100|116.26|141.43|{color:green}21.6%{color}|
|random|log|<all>|1000000|rand string|500|98.24|96.17|{color:red}-2.1%{color}|
|random|log|<all>|1000000|rand string|1000|86.38|71.95|{color:red}-16.7%{color}|
|random|log|<all>|1000000|country|10|148.65|153.23|{color:green}3.1%{color}|
|random|log|<all>|1000000|country|25|148.52|152.69|{color:green}2.8%{color}|
|random|log|<all>|1000000|country|50|122.01|149.52|{color:green}22.5%{color}|
|random|log|<all>|1000000|country|100|120.39|145.99|{color:green}21.3%{color}|
|random|log|<all>|1000000|country|500|99.70|95.65|{color:red}-4.1%{color}|
|random|log|<all>|1000000|country|1000|90.18|69.46|{color:red}-23.0%{color}|
|random|log|<all>|1000000|rand int|10|150.85|171.22|{color:green}13.5%{color}|
|random|log|<all>|1000000|rand int|25|151.13|167.94|{color:green}11.1%{color}|
|random|log|<all>|1000000|rand int|50|152.51|162.23|{color:green}6.4%{color}|
|random|log|<all>|1000000|rand int|100|130.54|145.04|{color:green}11.1%{color}|
|random|log|<all>|1000000|rand int|500|108.38|43.74|{color:red}-59.6%{color}|
|random|log|<all>|1000000|rand int|1000|98.27|63.56|{color:red}-35.3%{color}|


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message