lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance
Date Thu, 19 Nov 2009 10:33:40 GMT


Michael McCandless commented on LUCENE-2061:

I was baffled by why I see such sporadic QPS differences for reopen
rates, so I ran another test, this time always flushing after 100
buffered docs:

java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

Baseline QPS 146.74

Add only:
||Docs/sec||Reopen every (sec)||Reopen mean (ms)||Reopen stddev(ms)||QPS||% diff||

Baseline QPS 146.3

Delete + add:
||Docs/sec||Reopen every (sec)||Reopen mean (ms)||Reopen stddev(ms)||QPS||% diff||

Very strangely, by flushing every 100 docs, ie once per second even if
you're reopening at a slower rate, the QPS is much more reasonable:
pretty much unaffected by the ongoing indexing, either adding or
delete + adding.  I don't know how to explain this....

Also, note that reopen times are still longer for delete+add.  This is
because the deletes are still only being resolved when it's time to
reopen (or, time to merge), not after every 100 docs.  This also
explains why going from reopen sec 10 -> 30 didn't see any change in
the reopen time: after 10 seconds (= 10 new segments), a merge kicks
off, which always resolves the deletes.

So I think this is good news, in that it brings QPS back up to nearly
the baseline, but bad news in that, I have no idea why...

> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>                 Key: LUCENE-2061
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-2061.patch, LUCENE-2061.patch
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message