Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 12009 invoked from network); 1 May 2008 02:52:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 May 2008 02:52:51 -0000 Received: (qmail 425 invoked by uid 500); 1 May 2008 02:52:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 385 invoked by uid 500); 1 May 2008 02:52:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 373 invoked by uid 99); 1 May 2008 02:52:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2008 19:52:44 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.190.49.253] (HELO web52006.mail.re2.yahoo.com) (206.190.49.253) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 01 May 2008 02:51:48 +0000 Received: (qmail 79380 invoked by uid 60001); 1 May 2008 02:52:08 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=FvRG5v3J1jljSxhKyzu8UK+V1Q63B15ls98ntPkvS85gaRuEPbjpEMUzJH3BSNmIIWnP4vHsrhKRgNjNry2G9tMcKeMUqozFFWi11zzQxYYRvyYJ+Tasm9AQ94NI8FMklV/YLKMnQuu+HqET2CrjLjWWsiyA3r/641OZRurJ7j8=; X-YMail-OSG: nziyA44VM1n5u5fdeLjxhe49BQkBDCjWVYq6eCwL5IyPLOMj9uRKWM1ZGTjxqIBM1ZHjUO9.CZnoAu9xIFSDpXpZPurjB2cTuHmY9axaulng8AS6IJYyElp63vY- Received: from [12.162.3.126] by web52006.mail.re2.yahoo.com via HTTP; Wed, 30 Apr 2008 19:52:08 PDT Date: Wed, 30 Apr 2008 19:52:08 -0700 (PDT) From: Rajesh parab Subject: Re: ParalleReader and synchronization between indexes To: java-user@lucene.apache.org In-Reply-To: <50424.12811.qm@web50303.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <567106.78534.qm@web52006.mail.re2.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org My apologies for quick follow-ups and thanks for pointers/suggestions Grant and Otis. I did check various threads on Java user forum around this topic, but could not find a solution. Some most relevant topics that end with same question I am currently having. http://www.gossamer-threads.com/lists/lucene/java-user/15063?search_string=parallelreader;#15063 http://www.gossamer-threads.com/lists/lucene/java-user/31435?search_string=parallelreader;#31435 http://www.gossamer-threads.com/lists/lucene/java-user/50164?search_string=parallelreader;#50164 Otis, During incremental indexing, option of re-creating second index entirely will not work well in our case as we will be dealing with millions of documents. I am sorry for creating confusion by referring index as "small" index. I should have referred to it as index with less no of fields, which change very often. So, if first index with large no fields is not changing and second index with small set of fields requires constant updates due to frequent changes, is there a way to keep document ids of both indexes in sync without either re-creating second index entirely or modifying both indexes? Can we somehow keep internal document id same after updating (i.e. delete and re-insert) index document? Regards, Rajesh --- Otis Gospodnetic wrote: > Bravo Grant! > > Rajesh, I believe the following will work: > - delete your small index > - optimize your big index (needed? Not 100% sure, > but I think it is) > - loop through the docs in your "big" index > - for each document in the big index, add a document > to the small index > > When you are done you have big+small with docIDs in > sync. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - > Nutch > > ----- Original Message ---- > > From: Grant Ingersoll > > To: java-user@lucene.apache.org > > Sent: Wednesday, April 30, 2008 5:48:33 PM > > Subject: Re: ParalleReader and synchronization > between indexes > > > > Rajesh, > > > > You are asking a fairly complicated question on a > seldom used piece of > > functionality. Constantly pinging the list is > just making it less > > likely that someone will respond with an answer. > The likelihood that > > the 1 person who understand that code (and trust > me, it really is > > likely very few people who know how to practically > employ it) enough > > to give practical advice have read it in the time > period you have > > alloted us to respond is next to nil. We are all > volunteers with day > > jobs. > > > > Have you bothered to search the dev and user > mailing list for > > information on the class in question? I would > look for threads from > > Doug or Chuck Williams. > > > > -Grant > > > > > > On Apr 30, 2008, at 5:00 PM, Rajesh parab wrote: > > > > > Hi Guys, > > > > > > Any comments on this? > > > > > > I was looking into Lucene archive and came > across this > > > thread what asks the same question. > > > > > > > > > http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477 > > > > > > Any pointers will be helpful. > > > > > > Regards, > > > Rajesh > > > > > > --- Rajesh parab wrote: > > > > > >> Hi All, > > >> > > >> Any suggestions/comments on my questions in > this > > >> thread will be really helpful. > > >> > > >> We are planning to use Lucene indexes > throughout the > > >> application and exploring possibilites of > > >> partitioning > > >> data between multiple indexes. > > >> > > >> Regards, > > >> Rajesh > > >> > > >> --- Rajesh parab wrote: > > >> > > >>> Hi, > > >>> > > >>> This is from javadoc of ParallelReader: > > >>> > > >>> > > >> > > > > ====================================================== > > >>> > > >>> An IndexReader which reads multiple, parallel > > >>> indexes. > > >>> Each index added must have the same number of > > >>> documents, but typically each contains > different > > >>> fields. Each document contains the union of > the > > >>> fields > > >>> of all documents with the same document > number. > > >> When > > >>> searching, matches for a query term are from > the > > >>> first > > >>> index added that has the field. > > >>> > > >>> This is useful, e.g., with collections that > have > > >>> large > > >>> fields which change rarely and small fields > that > > >>> change more frequently. The smaller fields may > be > > >>> re-indexed in a new index and both indexes may > be > > >>> searched together. > > >>> > > >>> > > >> > > > > ====================================================== > > >>> > > >>> I have a similar use case as mentioned above > and > > >>> hence > > >>> would like to use ParallelReader to search > across > > >>> multiple indexes. > > >>> > > >>> I have an object that has 50 fields. Out of > these > > >> 50 > > >>> fields, 45 are relatively static and other 5 > are > > >>> modified very often. So, I am planning to > > >> partition > > >>> this objects data into 2 indexes such that 45 > > >> static > > >>> fields will be part of one index and remaining > 5 > > >>> dynamic fields will constitute second index. > While > > >>> generating the index for the first time, I can > > >> make > > >>> sure that the document order for documents > inside > > >>> both > > >>> these indexes is same and hence ParallelReader > > >> will > > >>> work properly with it. > > >>> > > >>> The question is - > > >>> What if the data inside second (smaller) index > > >>> changes? In order to update index document, I > will > > >>> have to delete it and re-insert it again as > Lucene > > >>> does not support document update. This action > (of > > >>> delete and re-insert) will change internal > > >> document > > >>> id > > >>> for updated document inside second index and > in > > >>> order > > >>> to sync it with first index, I will have to > also > > >>> modify first (relatively big and static) > index. If > > >>> we > > >>> will have to update both the indexes, how it > is > > >>> different from having a single index with all > the > > >>> fields? What is the use case in which > > >> ParallelReader > > >>> will get used? As per documentation, I was > > >> thinking > > >>> that it will apply for my use case, but > > >>> synchronizing > > >>> the indexes seems to be a problem. > > >>> > > >>> Please help. > > >>> > > >>> Regards, > > >>> Rajesh > > >>> > > >>> > > >>> > > >>> > > >>> > > >> > === message truncated === ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org