Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 39502 invoked from network); 31 Aug 2009 23:38:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Aug 2009 23:38:58 -0000 Received: (qmail 66028 invoked by uid 500); 31 Aug 2009 23:38:57 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 65942 invoked by uid 500); 31 Aug 2009 23:38:57 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 65934 invoked by uid 99); 31 Aug 2009 23:38:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Aug 2009 23:38:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Aug 2009 23:38:53 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BCC43234C046 for ; Mon, 31 Aug 2009 16:38:32 -0700 (PDT) Message-ID: <2139827380.1251761912771.JavaMail.jira@brutus> Date: Mon, 31 Aug 2009 16:38:32 -0700 (PDT) From: "Chuck Williams (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-600) ParallelWriter companion to ParallelReader In-Reply-To: <18441932.1150145249845.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749678#action_12749678 ] Chuck Williams commented on LUCENE-600: --------------------------------------- A given logical Document must have the same doc-id in each subindex, which is maintained by using a merge policy that guarantees consistency across the subindexes, either merge-by-count or merge-by-size as dictated by the size-dominant subindex. I just read your wiki page and it looks like your MasterMergePolicy is the same for the merge-by-size case, right? We've bee using parallel incremental indexing in production apps now for a long time, along with the efficient update mechanism described in the patent app. The original company I did this work for was acquired by a larger company who now owns the IP. I don't know how they would feel about a contribution of the latest version of ParallelWriter, which works with the current Lucene. I could inquire if you are truly open to it, but it sounds like you may be on your own path to a quite similar thing. Your wiki page says, "when you need to reindex this field you can simply create a new generation of this parallel index and fill it with the new values". That is the rub of the problem, and the one we created an efficient algorithm and implementation for several years ago. ParallelWriter is the easy part. > ParallelWriter companion to ParallelReader > ------------------------------------------ > > Key: LUCENE-600 > URL: https://issues.apache.org/jira/browse/LUCENE-600 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.1 > Reporter: Chuck Williams > Priority: Minor > Attachments: ParallelWriter.patch > > > A new class ParallelWriter is provided that serves as a companion to ParallelReader. ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject to: > 1. ParallelWriter.addDocument() is synchronized, which might have an adverse effect on performance. The writes to the sub-indexes are, however, done in parallel. > 2. The application must ensure that the ParallelReader is never reopened inside ParallelWriter.addDocument(), else it might find the sub-indexes out of sync. > 3. The application must deal with recovery from ParallelWriter.addDocument() exceptions. Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s) in one sub-index that were not successfully added to all sub-indexes, and then optimizing all sub-indexes. > A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter. This is in the same spirit as the existing Searchable and Fieldable classes. > This implementation uses java 1.5. The patch applies against today's svn head. All tests pass, including the new TestParallelWriter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org