lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance
Date Sun, 29 Nov 2009 09:54:20 GMT


Michael McCandless commented on LUCENE-2061:

bq. Can you post the queries file you've used?

I only used TermQuery "1", sorting by score.  I'd generally like to focus on worst case query
latency rather than QPS of "easy" queries.  Maybe we should switch to harder queries (phrase,

Though one thing I haven't yet focused on testing (which your work on LUCENE-1785 would improve)
is queries that hit the FieldCache -- we should test that as well.

I haven't seen the same results in regards to the OS managing
small files, and I suspect that users in general will choose a
variety of parameters (i.e. 1 max buffered doc) that makes
writing to disk inherently slow. Logically the OS should work as
a write cache, however in practice, it seems a variety of users
have reported otherwise. Maybe 100 docs works, however that
feels like a fairly narrow guideline for user's of NRT.

Yeah we need to explore this (when OS doesn't do effective write-caching), in practice.

The latest LUCENE-1313 is a step in a direction that doesn't
change IW internals too much.
I do like this simplification -- basically IW is internally managing how best to use RAM in
NRT mode -- but I think we need to scrutinize (through benchmarking, here) whether this is
really needed (ie, whether we can't simply rely on the OS to behave, with its IO cache).

> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>                 Key: LUCENE-2061
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-2061.patch, LUCENE-2061.patch, LUCENE-2061.patch
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message