lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance
Date Wed, 18 Nov 2009 10:14:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779387#action_12779387
] 

Michael McCandless commented on LUCENE-2061:
--------------------------------------------

OK I modified nrtBench.py to take advantage of some of the features in
LUCENE-2079:

  * Reopen thread runs with pri +2, indexing threads pri +1, search
    threads normal pri

  * I compute mean/stddev reopen time, and added to the tables

I made some other small changes, eg changed -report to create a
separate 'add only' vs 'delete + add' table.

Finally, I switched to a non-optimized 5M Wikpedia index (12
segments), with 1% deletions.  I think this is a more typical index
that an app would have after running NRT for a while.

New results:

JAVA:
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)


OS:
SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris


Baseline QPS 144.24


Add only:
||Docs/sec||Reopen every (sec)||Reopen mean (ms)||Reopen stddev(ms)||QPS||% diff||
|10.0|0.1|0.0|1.0|132.11|{color:red}-8.4%{color}|
|10.0|1.0|3.0|0.0|132.79|{color:red}-7.9%{color}|
|10.0|5.0|9.0|2.0|121.31|{color:red}-15.9%{color}|
|10.0|10.0|14.0|2.0|134.7|{color:red}-6.6%{color}|
|10.0|33.3|30.0|3.7|133.57|{color:red}-7.4%{color}|
|100.0|0.1|2.0|0.0|142.02|{color:red}-1.5%{color}|
|100.0|1.0|12.0|1.4|125.9|{color:red}-12.7%{color}|
|100.0|5.0|41.0|2.8|105.46|{color:red}-26.9%{color}|
|100.0|10.0|61.0|4.2|126.09|{color:red}-12.6%{color}|
|100.0|33.3|128.0|5.8|141.46|{color:red}-1.9%{color}|
|1000.0|0.1|15.0|168.8|102.14|{color:red}-29.2%{color}|
|1000.0|1.0|62.0|5.1|117.06|{color:red}-18.8%{color}|
|1000.0|5.0|192.0|7.4|123.7|{color:red}-14.2%{color}|
|1000.0|10.0|166.0|10.3|97.57|{color:red}-32.4%{color}|
|1000.0|33.3|162.0|12.1|101.52|{color:red}-29.6%{color}|

Delete + add:
||Docs/sec||Reopen every (sec)||Reopen mean (ms)||Reopen stddev(ms)||QPS||% diff||
|10.0|0.1|1.0|1.7|132.82|{color:red}-7.9%{color}|
|10.0|1.0|6.0|1.0|134.57|{color:red}-6.7%{color}|
|10.0|5.0|21.0|8.8|119.37|{color:red}-17.2%{color}|
|10.0|10.0|38.0|17.4|129.19|{color:red}-10.4%{color}|
|10.0|33.3|82.0|11.1|135.14|{color:red}-6.3%{color}|
|100.0|0.1|6.0|1.0|127.01|{color:red}-11.9%{color}|
|100.0|1.0|34.0|6.8|141.1|{color:red}-2.2%{color}|
|100.0|5.0|126.0|17.9|105.43|{color:red}-26.9%{color}|
|100.0|10.0|203.0|29.3|117.16|{color:red}-18.8%{color}|
|100.0|33.3|538.0|77.5|132.26|{color:red}-8.3%{color}|
|1000.0|0.1|45.0|187.8|96.84|{color:red}-32.9%{color}|
|998.9|1.0|246.0|41.0|95.32|{color:red}-33.9%{color}|
|996.6|5.0|941.0|154.4|102.17|{color:red}-29.2%{color}|
|999.5|10.0|1680.0|549.1|90.69|{color:red}-37.1%{color}|
|990.2|33.3|4587.0|2660.9|90.89|{color:red}-37.0%{color}|


Observations:

  * Something odd is still going on -- eg at 100 docs/sec, when we
    reopen every 30 sec we see fairly small hit to QPS for both the
    add & delete+add, vs reopening more often.  Reopening every 5
    seconds is by far the worse.  Strange.

  * Right off the bat, even at 10 docs/sec, we take a hit in QPS for
    both add and delete+add cases

  * The delete+add generally (though not always) has worse QPS than
    the add only case

  * Curiously it seems like reopening less frequently often hurts QPS
    more (though not always) -- I would have expected overall better
    QPS throughput, even though when the reopen happens it takes
    longer to turnaround; strange.

  * Delete+add clearly takes longer to turnaround the new reader, but,
    the times remain reasonable even up to 1000 docs/sec.  The faster
    you reopen your reader, the less time the reopen takes since there
    are fewer delete+adds to process.


> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2061
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2061
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-2061.patch, LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message