lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Problems with homebrew ParallelWriter
Date Thu, 24 Jun 2010 03:48:22 GMT
How do you add documents to the index? Is it synchronized (such that
basically only one thread can add documents at a time)?
The same goes for removing documents as well.

Also, did you encounter any exceptions during the run - if say an addDoc
fails on one of the slices, then you need to revert that addDoc in all
previous slices ...

I remember running into such exception when working on the Parallel Index
stuff, but I don't remember what caused it ...

About merging, note that if you use LogDocMP, then you can guarantee that
all slices will be in sync, but still some merges could happen on some
slices not when you intended them to happen. For example, during a flush of
one addDoc on one of the slices, before the others addDoc finished. But if
you didn't see any exceptions and didn't terminate the process mid-action,
then this should not happen ...

I hope this helps. Unfortunately I had to shift focus from LUCENE-1879.
Perhaps I'll get back to it one day. But if you advanced on PI somehow,
perhaps you can diff the patch that's there and your code, and if you've
made progress, upload another patch?


On Thu, Jun 24, 2010 at 1:44 AM, Justin <> wrote:

> Hi all,
> We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own
> ParallelWriter class in the meantime.  Apparently our indexes are falling
> out of sync (I suspect my colleague is seeing error messages come from
> ParallelReader stating the the number of documents must be the same).
> Here's a code snippet from our ParallelWriter which extends Object:
>    writer1 = new IndexWriter(dir, analyzer,
> create,
> new IndexWriter.MaxFieldLength(MFL));
> writer1.setMergePolicy(new LogDocMergePolicy());
> writer1.setMergeScheduler(new SerialMergeScheduler());
> writer1.setMaxBufferedDocs(MBD);
> writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH);
> My colleague suspects that merging or flushing is being triggered on
> something other than the doc count which leads to the writers' different
> behaviors.  I suspect our next step is to scatter breakpoints around Lucene
> source (we've got trunk@926791 to take advantage of latest NRT readers).
> Does anyone have ideas on how the indexes would get out of sync?  Process
> close, committing, optimizing,... they all should work okay?
> Thanks,
> Justin
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message