lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance
Date Fri, 27 Nov 2009 10:49:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783093#action_12783093
] 

Michael McCandless commented on LUCENE-2061:
--------------------------------------------

BTW, based on these last results I posted here, the rough conclusion
seems to be that so long as you set up IW to flush every N docs (which
I still don't understand why it's necessary) the ongoing indexing &
reopening does not hurt QPS substantially when compared to the "pure
searching" baseline.

This is an important result.  It means all the other optimizations
we're pursuing for NRT are not really necessary.  (At least on the env
I tested).  I think it must be that the OS is quite efficient at
creating smallish files and turning around these files for reading (ie
its file write cache is "effectively" emulating a RAMDirectory).


> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2061
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2061
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-2061.patch, LUCENE-2061.patch, LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message