Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 80570 invoked from network); 30 Aug 2009 13:08:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Aug 2009 13:08:47 -0000 Received: (qmail 81466 invoked by uid 500); 30 Aug 2009 13:08:47 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 81373 invoked by uid 500); 30 Aug 2009 13:08:46 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 81365 invoked by uid 99); 30 Aug 2009 13:08:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Aug 2009 13:08:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yseeley@gmail.com designates 209.85.219.222 as permitted sender) Received: from [209.85.219.222] (HELO mail-ew0-f222.google.com) (209.85.219.222) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Aug 2009 13:08:36 +0000 Received: by ewy22 with SMTP id 22so3337833ewy.28 for ; Sun, 30 Aug 2009 06:08:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:reply-to:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=idPIGvfy7guQoIMENQPUPglmRM37dmqs3GifB7BsILY=; b=Ml7zZLzhes87RrJGOgsIj6TGj30zHc4EBZlNCPJO4MtjbwCSJvKvQS/zddE26/oqeK pDOgaRXskgYI+YecbYHniRpEEux09aRPsSGWvRYbSbrVp7fTC9fQ+TKVzotPZxkQiQPe MjnYled0ADUHyPyUHI9hAgJkTrXQ2jglASf+g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=YU0jQmwcKr8Ksz8fzodclZVfCr6XqwKN4cYeLqCtPIMdD1cdpGnXRrNXhR0TY2F7o+ 5bOQCDZ5SnVwwD51XIOdzT/jl0iJ1YCJfNgvdbxezJ2wceMx1d/oGr126PXAR0aBBeyn gGSV/wiD6bThzGUX4Qpw3TQEISYBRTzQ+uRpw= MIME-Version: 1.0 Sender: yseeley@gmail.com Reply-To: yonik@lucidimagination.com Received: by 10.216.11.72 with SMTP id 50mr880997wew.64.1251637696045; Sun, 30 Aug 2009 06:08:16 -0700 (PDT) In-Reply-To: <4A9A1AC8.7090805@gmail.com> References: <4A9A1AC8.7090805@gmail.com> Date: Sun, 30 Aug 2009 09:08:16 -0400 X-Google-Sender-Auth: bc30a68b4ff6e80e Message-ID: Subject: Re: Parallel incremental indexing From: Yonik Seeley To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Cool stuff! We should also think about how to do single document field updates or field adds since that is the most common usecase - not that it needs to be implemented in the first version, but kept in mind so we don't box ourselves in. Doug mentioned some ideas he had in passing almost a year ago about how to add a field to a single document, and it is similar in that it used parallel reader. IndexWriter would be modified to maintain the same structure across parallel indexes, as you note. If one wanted to add a new field value to document 1000, one would have to index dummy documents for docs 0-999... instead of this, the index format should support gaps. On a segment merge, the IndexWriter could simply merge in this new segment. Anyway, updateable documents is fundamental enough, we should also consider changes to the index format if it makes it easer. -Yonik http://www.lucidimagination.com On Sun, Aug 30, 2009 at 2:23 AM, Michael Busch wrote: > Hi all, > > I just added a wiki page for a new feature I'd like to add to > Lucene. Please take a look at the link. I will add more details and > diagrams to the page, but for now it should give a rough idea about > how to implement it: > > http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing > > Basically the idea is to allow updating documents partially, e.g. only > a subset of the fields without having to reindex the entire > document. This is a feature that is very often asked for. > > We have implemented the solution in IBM and it's working > great. It is a technology that allowed us already to add really exciting > new features to products that weren't easily possible before. > > The implementation I can currently contribute has some limitations: > e.g. multi-threaded indexing is not supported. But let me make clear > that this is not a limitation of the design described in the wiki - we > have these limitations because we implemented this on top of Lucene's 2.4 > APIs. If we decide to add this to Lucene's core we should > reimplement some parts to overcome those limitations. > > In my opinion this will be a great addition to Lucene that many > people will find very useful. In Solr this is also something users often > ask for. > > In the last weeks I worked on getting internal approval for the contribut= ion > to Lucene and the good news is that I already have a signed > software grant ready - so if the community likes this feature and > decides to add this to Lucene there won't be any delay for legal work > from IBM's side. > > Btw: I will be on vacation from 09/03-09/20 and won't have internet > access most of the time, so if I stop responding end of next week you'll > know why... > > Please let me know what you think! > > =A0Michael > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org