Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64062 invoked from network); 24 Jun 2010 21:24:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 21:24:31 -0000 Received: (qmail 97153 invoked by uid 500); 24 Jun 2010 21:24:29 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 97069 invoked by uid 500); 24 Jun 2010 21:24:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 97061 invoked by uid 99); 24 Jun 2010 21:24:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 21:24:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.191.69.78] (HELO web32901.mail.mud.yahoo.com) (209.191.69.78) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 24 Jun 2010 21:24:19 +0000 Received: (qmail 62985 invoked by uid 60001); 24 Jun 2010 21:23:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1277414637; bh=tgQnBBvyN9I5qZz7/+FWsithbfG3g/PObqM+nw2chlM=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=xmQOKJfoK1PH440O1vg/asJuPrULfBTwCKQp5MyJDVVqY/QEo61r1EULC7QIkV+1mnshqSjqyNfr1/Px7tGetJx/axb2IlZDLZ1AKcc5e1iHX2KMjisxilII9rX73ibeJ60Mdp2b4zMZdxOWvvllFYx+EOI02Tdhl1WpDaB4hrs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=Edw4X6qOIFbFfGzDQyruGIDv2119BU4TTwYH05lN71YM8eKbviydsMBCiEjfaSUUxOY4CKeyDN2LU1L74uncQLGXxDz08UT6+lSAis08V1OGN+Pbzfxa/5ZPzkXt0ZtDB5R66W0p8zYmGg7AzLb9G02+Ltm8m6c8YFf+3J4Db4Q=; Message-ID: <715207.62960.qm@web32901.mail.mud.yahoo.com> X-YMail-OSG: JqEQNLoVM1k4KAsXzcdvGdScHqgvb1q5.UJAaSVdkjaQvUN RIYBJ_LW1XMdCv_M82usJq4VvmqgU5uIh9QzgdL05cJLnugkMWH4tsklNJOd ze3vl0domf5IqWnaFkP55Th8k_.KqsCCYAyEYTY_fuCeS6oB1GQNV.b.5MuV 84v0Zz5teogmMByYUS22GW634E6Lo9qNoAUxXUFSQgjb4z52_m6MCPOGJ60e GHgfg2ilsE4NaDzgjHtPhL.okBSyGfzAPt65ogy3hlkWOSQ9ToYUd2maSDC0 0z2GNRVhaME2jyDplUP7bhiI0SQOvS5XCm6TgzNaahOhYC1On2sHjxnzTCCO lRxLA93fMFUqtFvLmPR.M4LWilfE- Received: from [72.36.94.20] by web32901.mail.mud.yahoo.com via HTTP; Thu, 24 Jun 2010 14:23:57 PDT X-Mailer: YahooMailRC/397.8 YahooMailWebService/0.8.104.274457 References: <649405.44653.qm@web32901.mail.mud.yahoo.com> <70147.11648.qm@web32901.mail.mud.yahoo.com> <835477.39284.qm@web32908.mail.mud.yahoo.com> Date: Thu, 24 Jun 2010 14:23:57 -0700 (PDT) From: Justin Subject: Re: Problems with homebrew ParallelWriter To: java-user@lucene.apache.org In-Reply-To: <835477.39284.qm@web32908.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Nevermind, it is blocking... public void optimize() throws CorruptIndexException, IOException { optimize(true); } ----- Original Message ---- From: Justin To: java-user@lucene.apache.org Sent: Thu, June 24, 2010 3:56:17 PM Subject: Re: Problems with homebrew ParallelWriter So is IndexWriter::optimize() non-blocking, even with SerialMergeScheduler? That might explain our problem in trying to use optimize() to make maxDoc() match between the two indexes before adding readers to ParallelReader. I see that we could call optimize(true). ----- Original Message ---- From: Justin To: java-user@lucene.apache.org Sent: Thu, June 24, 2010 12:12:57 PM Subject: Re: Problems with homebrew ParallelWriter Hi Shai, > Is it synchronized public synchronized void addDocument(Document document) throws CorruptIndexException, IOException { Document document2 = new Document(); document2.add(...); writer1.addDocument(document); writer2.addDocument(document2); } > did you encounter any exceptions I haven't seen the machines firsthand, but I assume my colleague looked for obvious exceptions that would lead to an imbalance. All exceptions appear to be logged, so we would see something. > merges could happen on some slices not when you intended public synchronized ParallelReader getParallelReader() throws IOException, CorruptIndexException { IndexReader reader1 = writer1.getReader(); IndexReader reader2 = writer2.getReader(); if (reader1.maxDoc() != reader2.maxDoc()) { reader1.close(); reader2.close(); writer1.optimize(); // force merge for consistent maxDoc writer2.optimize(); // force merge for consistent maxDoc reader1 = writer1.getReader(); reader2 = writer2.getReader(); } ParallelReader reader = new ParallelReader(); reader.add(reader1); reader.add(reader2); return reader; } As you can see above, my colleague optimizes the indexes to account for merges that have occurred out-of-sync. > if you've made progress, upload another patch? If we make a revelation with regards to ParallelWriter, I'll be happy to share. Thanks for giving us some places to look. Justin ----- Original Message ---- From: Shai Erera To: java-user@lucene.apache.org Sent: Wed, June 23, 2010 10:48:22 PM Subject: Re: Problems with homebrew ParallelWriter How do you add documents to the index? Is it synchronized (such that basically only one thread can add documents at a time)? The same goes for removing documents as well. Also, did you encounter any exceptions during the run - if say an addDoc fails on one of the slices, then you need to revert that addDoc in all previous slices ... I remember running into such exception when working on the Parallel Index stuff, but I don't remember what caused it ... About merging, note that if you use LogDocMP, then you can guarantee that all slices will be in sync, but still some merges could happen on some slices not when you intended them to happen. For example, during a flush of one addDoc on one of the slices, before the others addDoc finished. But if you didn't see any exceptions and didn't terminate the process mid-action, then this should not happen ... I hope this helps. Unfortunately I had to shift focus from LUCENE-1879. Perhaps I'll get back to it one day. But if you advanced on PI somehow, perhaps you can diff the patch that's there and your code, and if you've made progress, upload another patch? Shai On Thu, Jun 24, 2010 at 1:44 AM, Justin wrote: > Hi all, > > We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own > ParallelWriter class in the meantime. Apparently our indexes are falling > out of sync (I suspect my colleague is seeing error messages come from > ParallelReader stating the the number of documents must be the same). > > Here's a code snippet from our ParallelWriter which extends Object: > > writer1 = new IndexWriter(dir, analyzer, > create, > > new IndexWriter.MaxFieldLength(MFL)); > > writer1.setMergePolicy(new LogDocMergePolicy()); > > writer1.setMergeScheduler(new SerialMergeScheduler()); > > writer1.setMaxBufferedDocs(MBD); > > writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH); > > My colleague suspects that merging or flushing is being triggered on > something other than the doc count which leads to the writers' different > behaviors. I suspect our next step is to scatter breakpoints around Lucene > source (we've got trunk@926791 to take advantage of latest NRT readers). > > Does anyone have ideas on how the indexes would get out of sync? Process > close, committing, optimizing,... they all should work okay? > > Thanks, > Justin > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org