lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter
Date Sun, 29 Mar 2009 13:57:50 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: ssd2.png


OK using the last patch, I ran another near real-time test, using this
alg:

{code}

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker

merge.policy=org.apache.lucene.index.LogDocMergePolicy

docs.file=/Volumes/External/lucene/wiki.txt
doc.stored = false
doc.term.vector = false
doc.add.log.step=10
max.field.length=2147483647

directory=FSDirectory
autocommit=false
compound=false
merge.factor = 10
ram.flush.mb = 128
doc.maker.forever = false
doc.random.id.limit = 3204040

work.dir=/lucene/work

{ "BuildIndex"
  - OpenIndex
  - NearRealtimeReader(1)
   { "UpdateDocs" UpdateDoc > : 100000 : 50/sec
  - CloseIndex
}

RepSumByPrefRound BuildIndex
{code}

It opens a full (3.2M docs, previously built) wikipedia index, then
randomly selects a doc and updates it (deletes old, adds new) at the
rate of 50 docs/sec.  Then, once per second I open a new reader, do
the same search (term "1", sorted by date).

I attached another graph (ssd2.png) with the results, showing reopen &
search time as a function of how many updates have been done; rough
comments:

  * Search time is pretty constant ~35 msec, except occassional
    glitches where it goes as high as ~340 msec.  Net/net very
    reasonable I think.

  * Search time is remarkably non-noisy, except for occasional
    spikes.

  * Reopen time is also fast (~ 40 msec) but is more noisy.

  * It's not clear the merges are really impacting things that much.
    It could simply be that I didn't run test for long enough for a
    big merge to run.  Also, this index has no stored fields nor term
    vectors, so if we added those, merges would get slower.

  * This is a better test than last one, since it's doing some deletes

  * Since I open writer with autoCommit false, and near-realtime
    carries all pending deletes in RAM, no *.del file ever gets
    written to the index


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message