Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 28443 invoked from network); 6 Dec 2004 17:28:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 6 Dec 2004 17:28:50 -0000 Received: (qmail 71044 invoked by uid 500); 6 Dec 2004 17:28:03 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 70978 invoked by uid 500); 6 Dec 2004 17:28:02 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 70748 invoked by uid 99); 6 Dec 2004 17:27:58 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from fork6.mail.Virginia.EDU (HELO fork6.mail.virginia.edu) (128.143.2.176) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 06 Dec 2004 09:27:52 -0800 Received: from localhost (localhost [127.0.0.1]) by fork6.mail.virginia.edu (Postfix) with ESMTP id DCC6B1C053 for ; Mon, 6 Dec 2004 12:27:19 -0500 (EST) Received: from fork6.mail.virginia.edu ([127.0.0.1]) by localhost (fork6.mail.virginia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 20294-05 for ; Mon, 6 Dec 2004 12:27:19 -0500 (EST) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by fork6.mail.virginia.edu (Postfix) with ESMTP id 486911C0D1 for ; Mon, 6 Dec 2004 12:27:18 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619) In-Reply-To: <000201c4dbb7$7697bb50$36a8a8c0@intranet.researchandmarkets.com> References: <000201c4dbb7$7697bb50$36a8a8c0@intranet.researchandmarkets.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <13E20865-47AC-11D9-8105-000A95BC61B6@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: addIndexes() Size Date: Mon, 6 Dec 2004 12:27:17 -0500 To: "Lucene Users List" X-Mailer: Apple Mail (2.619) X-UVA-Virus-Scanned: by amavisd-new at fork6.mail.virginia.edu X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N There was a bug in 1.4 (and maybe 1.4.1?) that kept some index files around that were not used. Are you using Lucene 1.4.3? It not, try that and see if it helps. Erik On Dec 6, 2004, at 12:17 PM, Garrett Heaver wrote: > No there are no duplicate copies - I've the correct number when I view > through luke and I don't overlap - the temporary index is destroyed > after it > is added to the main index - I'm currently at index version 159 and it > seems > that all of my .prx files come in at around 1435 megs (ouch) > > Thanks > Garrett > > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] > Sent: 06 December 2004 17:12 > To: Lucene Users List > Subject: Re: addIndexes() Size > > If I were you, I would first use Luke to peek at the index. You may > find something obvious there, like multiple copies of the same > Document. > Does your temp index 'overlap' with A index in terms of Documents? If > so, you will end up with multliple copies, as addIndexes method doesn't > detect and remove duplicate Documents. > > Otis > > --- Garrett Heaver wrote: > >> Hi. >> >> >> >> Its probably really simple to explain this but since I'm not up to >> speed on >> the way Lucene stores the data I'm a little confused. >> >> >> >> I'm building an Index, which resides on Server A, with the Lucene >> Service >> running on Server B. Now not to bore you with the details but because >> of the >> network transfer rate etc I'm running the actual index on >> \\ServerA\idx >> and building a temp Index at >> \\ServerB\idx\temp >> (obviously because the Local FS is much >> faster >> for the service) and then calling addIndexes to import the temp index >> to the >> ServerA index before destroying the ServerB index, holding for a bit >> and >> then checking for new documents. >> >> >> >> All works grand BUT the size of the resultant index on ServerA is >> HUGE in >> comparison to one I'd build from start to finish (i.e. a simple >> addDocument >> Index) - 38gig for 220,000 Unstored Items cannot be right (to give >> you and >> idea of how mad this seems, the backed up version of the database >> from which >> the data is pulled is only 2gigs) >> >> >> >> I've considered it being perhaps the number of Items that had to be >> integrated each time addIndexes was called - right now I'm adding >> around >> 10,000 at a time (I had done 1000 at a time but this looked like it >> was >> going to end up even larger still) >> >> >> >> I'm holding off twiddling the minMergeDocs and mergeFactor until I >> can get a >> better understanding of whats going on here. >> >> >> >> Many thanks for any reply's >> >> Garrett >> >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org