Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 22309 invoked from network); 2 Nov 2005 14:35:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Nov 2005 14:35:52 -0000 Received: (qmail 55136 invoked by uid 500); 2 Nov 2005 14:35:48 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 55100 invoked by uid 500); 2 Nov 2005 14:35:47 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 55088 invoked by uid 99); 2 Nov 2005 14:35:47 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 06:35:47 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.86.89.69] (HELO smtpauth09.mail.atl.earthlink.net) (209.86.89.69) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 06:35:42 -0800 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=jGJnHH8x8nZDe8Wju6f+Y2jJru1JCJu/6HyPcXqEJGc7kNDpFPNNEsuM+9yBhYMj; h=Received:Reply-To:From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Priority:X-MSMail-Priority:X-Mailer:Importance:X-MimeOLE:In-Reply-To:X-ELNK-Trace:X-Originating-IP; Received: from [66.245.68.111] (helo=ENGELSSERVER) by smtpauth09.mail.atl.earthlink.net with asmtp (Exim 4.34) id 1EXJi1-0006qa-6T for java-dev@lucene.apache.org; Wed, 02 Nov 2005 09:35:25 -0500 Reply-To: From: "Robert Engels" To: Subject: RE: Faking index merge by modifying segments file? Date: Wed, 2 Nov 2005 08:35:26 -0600 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: <20051102111403.79601.qmail@web50306.mail.yahoo.com> X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7bd15d05d9470ff710f5abb18790461eab1e32429dc0388bc9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 66.245.68.111 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N There only need to be sorted if segA and segB were combined so in your case, this is not needed. I am not sure that what you are describing is any different than how MultiReader works, and it does not need to perform any file copying of linking. Just create the new index. Write the documents. And open all indexes using a MultiReader? Maybe I am missing something, but I see that as a simple way of doing what you are trying to do. -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] Sent: Wednesday, November 02, 2005 5:14 AM To: java-dev@lucene.apache.org Subject: RE: Faking index merge by modifying segments file? Hello, --- Robert Engels wrote: > Problem is the terms need to be sorted in a single segment. Are you referring to Term Dictionary (.tis and .tii files as described at http://lucene.apache.org/java/docs/fileformats.html )? If so, is that really true? I don't have an optimized Lucene multi-file index handy to look at, but .tis and .tii files are "per segment" files, so wouldn't a set of .tis and .tii files from multiple indices be equivalent to a set of .tis and .tii files from multiple segments of a single index? For example, if we have two indices, A and B, both optimized, we have: A: segA.tis (this may contain terms bar and foo) segA.tii ... segments (this would list segA) B: segB.tis (this may contain terms piggy and bank) segB.tii ... segments (this would list segB) Wouldn't that be the same as a single index, say index C: C: segA.tis (this may contain terms bar and foo) segA.tii segB.tis (this may contain terms piggy and bank) segB.tii ... segments (this would list segments segA and segB) That is really what I am talking about: take all index files of index A and all index files of segment B and stick them in a new index dir for a new index C. Then open segments files of index A and index B, pull out segment names and other information from there, and write a new segments file with that information in index dir for that new index C. This sounds like it should be possible, except for docId clashes - if index A had a document with Id 100 and index B also has a document with Id 100, after my index file copying, index C will end up having 2 documents with Id 100, and that won't work. So, documents in C would have to be renumbered (re-assigned Ids), as they get renumbered during optimization, but without rewriting all index files in index C. Does this sound right? Also, I may not need to actually copy/move files around, if I just make use of sym/hard links. Thanks, Otis > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] > Sent: Tuesday, November 01, 2005 1:52 AM > To: java-dev@lucene.apache.org > Subject: Faking index merge by modifying segments file? > > > Hello, > > I spent most of today talking to some people about Lucene, and one of > them said how they would really like to have an "instantaneous index > merge", and how he is thinking he could achieve that by simply > opening > segments file of one index, and adding segment names of the other > index/indices, plus adjusting the segment size (SegSize in > fileformats.html), thus creating a single (but unoptimized) index. > > Any reactions to that? > > I imagine this isn't quite that simple to implement, as one would > have > to renumber all documents, in order to avoid having multiple > documents > with the same document id. > > Can anyone think of any other problems with this approach, or perhaps > offer ideas for possible document renumbering? > > Thanks, > Otis > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org