lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problems with homebrew ParallelWriter
Date Thu, 24 Jun 2010 09:04:47 GMT
I agree w/ Shai -- from your description it looks like your docs
should be in sync (assuming no exceptions, and a serial doc/del stream
going in).

If you turn on infoStream for all the writers & post the results, we
can look for where they diverge...

Mike

On Wed, Jun 23, 2010 at 11:48 PM, Shai Erera <serera@gmail.com> wrote:
> How do you add documents to the index? Is it synchronized (such that
> basically only one thread can add documents at a time)?
> The same goes for removing documents as well.
>
> Also, did you encounter any exceptions during the run - if say an addDoc
> fails on one of the slices, then you need to revert that addDoc in all
> previous slices ...
>
> I remember running into such exception when working on the Parallel Index
> stuff, but I don't remember what caused it ...
>
> About merging, note that if you use LogDocMP, then you can guarantee that
> all slices will be in sync, but still some merges could happen on some
> slices not when you intended them to happen. For example, during a flush of
> one addDoc on one of the slices, before the others addDoc finished. But if
> you didn't see any exceptions and didn't terminate the process mid-action,
> then this should not happen ...
>
> I hope this helps. Unfortunately I had to shift focus from LUCENE-1879.
> Perhaps I'll get back to it one day. But if you advanced on PI somehow,
> perhaps you can diff the patch that's there and your code, and if you've
> made progress, upload another patch?
>
> Shai
>
> On Thu, Jun 24, 2010 at 1:44 AM, Justin <crynax@yahoo.com> wrote:
>
>> Hi all,
>>
>> We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own
>> ParallelWriter class in the meantime.  Apparently our indexes are falling
>> out of sync (I suspect my colleague is seeing error messages come from
>> ParallelReader stating the the number of documents must be the same).
>>
>> Here's a code snippet from our ParallelWriter which extends Object:
>>
>>    writer1 = new IndexWriter(dir, analyzer,
>> create,
>>
>> new IndexWriter.MaxFieldLength(MFL));
>>
>> writer1.setMergePolicy(new LogDocMergePolicy());
>>
>> writer1.setMergeScheduler(new SerialMergeScheduler());
>>
>> writer1.setMaxBufferedDocs(MBD);
>>
>> writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH);
>>
>> My colleague suspects that merging or flushing is being triggered on
>> something other than the doc count which leads to the writers' different
>> behaviors.  I suspect our next step is to scatter breakpoints around Lucene
>> source (we've got trunk@926791 to take advantage of latest NRT readers).
>>
>> Does anyone have ideas on how the indexes would get out of sync?  Process
>> close, committing, optimizing,... they all should work okay?
>>
>> Thanks,
>> Justin
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message