lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter
Date Fri, 27 Mar 2009 19:44:50 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: magnetic.png
                ssd.png

OK I ran a basic initial test of the latency when opening a near
real-time reader.

Using contrib/benchmark, I index wikipedia docs like normal, but then
I added a NearRealTimeReaderTask, which runs a BG thread that once
every N (I did 3) seconds it gets a new reader from the writer.

Then it does a simple search for term "1" in the body, and sorts by
the docdate field.

I measured milli-seconds to reopen and to run the search, and plot
those as a function of index size.

I ran two tests.  The first (attached as ssd.png) stores the index on
an Intel X25M solid-state disk; the second (attached as magnetic.png)
on a WD Velociraptor.

Notes:

  * The reopen time is ~700 msec in both cases, and doesn't change
    much as index grows (which is nice).

  * It is quite noisy, likely due to merges committing.

  * I logged (but did not graph) the flush time vs actual reopen time,
    and it's the flush time that dominates.  This is good because with
    a slower indexing rate, this flush time would go way down.  My
    guess is flushing is CPU bound not IO bound.

  * The search time ramps up linearly (expected), and also shows
    spikes due to merging.  There's one massive spike at the end of
    the SSD one that's odd (did not correspond to reopening after a
    merge, though perhaps during a merge).

  * This is a somewhat overly stressful test because I'm indexing docs
    at full speed.  Whereas I'd expect for the typical near realtime
    search app, the docs would usually be trickling in more slowly and
    a reopen would happen after just a few docs.

  * SSD and magnetic look pretty darn similar, though magnetic shows
    more noise and maybe is more affected by merges.

  * I'm doing no deletions in this test, but the typical near
    real-time app presumably would.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
magnetic.png, ssd.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message