lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Concurrent merge
Date Thu, 22 Feb 2007 02:20:03 GMT
> > The downside is another complexity increase though.

I think complexity can be divided in two:
(1) more complex synchronization and data-manipulation/accounting
(2) multi-threading.

The multi-threading becoming part of and responsibility of Lucne
seems quite a change to me. Lucene's being single threaded is a
simplifying factor, an advantage to my opinion.

So how about, alternatively, (perhaps optionally, probably in a
subclass) just reducing the synchronization level of IndexWriter,
so one could call addDocument, deleteDocument, optimize() etc.
in more than one thread, in parallel.

The critical sections would be similar to the proposal below, and
delicate synchronization details need to be looked after. It is
in fact possible that synchronization wise this is more challenging
then the proposal when Lucene launches the threads, because there
can be more than two threads.

But this way Lucene itself remains single threaded. It is the
application decision/responsibility to launch and manage these
threads.

Just a thought.

robert engels <rengels@ix.netcom.com> wrote on 21/02/2007 15:29:56:

> I think when you start discussing background threads you need to
> think server environment.
>
> It is fairly trivial there. I have pushed to move Lucene in that
> direction, rather than the multiple client accessing a shared
> resource via a network filesystem. No decent server product works
> this way.
>
> On Feb 21, 2007, at 5:23 PM, Yonik Seeley wrote:
>
> > On 2/21/07, Doron Cohen <DORONC@il.ibm.com> wrote:
> >> Ning Li wrote:
> >>
> >> > There are three main challenges in enabling concurrent merge:
> >> >   1 a robust merge policy
> >> >   2 detect when merge lags document additions/deletions
> >> >   3 how to slow down document additions/deletions (and amortize
> >> >     the cost) when merge falls behind
> >>
> >> I wonder what it means for current API semantics -
> >>
> >> - An application today can set max-bufferred-docs to N, and after
> >> the Nth (or N+1th?) call to addDoc returns, a newly opened searcher
> >> would see these docs. With merges in a background thread this
> >> might not hold.
> >>
> >> - Today, after add(), an application can call flush() or close(),
> >> but with a background merge thread these calls would be blocked.
> >> Mmm... this is probably not a behavior change, because today
> >> these operations can trigger a merge that would take a long(er) time.
> >
> > We shouldn't advertise or guarantee that behavior.  This wasn't even
> > true before the new merge policy was implemented.
> >
> >> - numRamDocs() and ramSizeInBytes() - not sure what they mean
> >> once a background merge thread had started.
> >
> > IMO, for the current "batch" of documents being buffered.
> > The "old" buffered documents should be flushed to disk ASAP.
> >
> >> Still, having non blocking adds is compelling.
> >
> > Somewhat... It would result in some performance increase...
> > overlapping analysis of new documents with merging of other segments,
> > resulting in a higher CPU utilization (esp on multi-processor
> > systems).  The larger the maxBufferedDocs, the better.
> >
> > The downside is another complexity increase though.
> >
> > -Yonik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message