Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 63722 invoked from network); 26 Mar 2010 19:42:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Mar 2010 19:42:49 -0000 Received: (qmail 14393 invoked by uid 500); 26 Mar 2010 19:42:48 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 14351 invoked by uid 500); 26 Mar 2010 19:42:48 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 14344 invoked by uid 99); 26 Mar 2010 19:42:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 19:42:48 +0000 X-ASF-Spam-Status: No, hits=-1147.5 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 19:42:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 44A3B234C4FB for ; Fri, 26 Mar 2010 19:42:27 +0000 (UTC) Message-ID: <2088277866.517941269632547280.JavaMail.jira@brutus.apache.org> Date: Fri, 26 Mar 2010 19:42:27 +0000 (UTC) From: "Grant Ingersoll (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1879) Parallel incremental indexing In-Reply-To: <1995931318.1251705701427.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-1879?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D128= 50322#action_12850322 ]=20 Grant Ingersoll commented on LUCENE-1879: ----------------------------------------- First off, I haven't looked at the code here or the comments beyond skimmin= g, but this is something I've had in my head for a long time, but don't hav= e any code. When I think about the whole update problem, I keep coming bac= k to the notion of Photoshop Layers that essentially mask the underlying pa= rt of the photo, w/o damaging it. The analogy isn't quite the same here, b= ut nevertheless... This leads me to wonder if the solution isn't best achieved at the index le= vel and not at the Reader/Writer level. =20 So, thinking out loud here and I'm not sure on the best wording of this: = =20 when a document first comes in, it is all in one place, just as it is now. = Then, when an update comes in on a particular field, we somehow mark in th= e index that the document in question is modified and then we add the new c= hange onto the end of the index (just like we currently do when adding new = docs, but this time it's just a doc w/ a single field). Then, when searc= hing, we would, when scoring the affected documents, go to a secondary proc= ess that knew where to look up the incremental changes. As background merg= ing takes place, these "disjoint" documents would be merged back together. = We'd maybe even consider a "high update" merge scheduler that could more f= requently handle these incremental merges. In a sense, the old field for t= hat document is masked by the new field. I think, given proper index struc= ture, that we _maybe_ could make that marking of the old field fast (maybe = it's a pointer to the new field, maybe it's just a bit indicating to go loo= k in the "update" segment) On the search side, I think performance would still be maintained b/c even = in high update envs. you aren't usually talking about more than a few thous= and changes in a minute or two and the background merger would be responsib= le for keeping the total number of disjoint documents low. > Parallel incremental indexing > ----------------------------- > > Key: LUCENE-1879 > URL: https://issues.apache.org/jira/browse/LUCENE-1879 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Fix For: 3.1 > > Attachments: parallel_incremental_indexing.tar > > > A new feature that allows building parallel indexes and keeping them in s= ync on a docID level, independent of the choice of the MergePolicy/MergeSch= eduler. > Find details on the wiki page for this feature: > http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing=20 > Discussion on java-dev: > http://markmail.org/thread/ql3oxzkob7aqf3jd --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org