Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5957 invoked from network); 9 Feb 2011 23:15:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2011 23:15:15 -0000 Received: (qmail 19513 invoked by uid 500); 9 Feb 2011 23:15:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19358 invoked by uid 500); 9 Feb 2011 23:15:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19350 invoked by uid 99); 9 Feb 2011 23:15:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 23:15:12 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 23:15:07 +0000 Received: by ywc21 with SMTP id 21so347556ywc.35 for ; Wed, 09 Feb 2011 15:14:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.91.27.32 with SMTP id e32mr2028564agj.37.1297293285693; Wed, 09 Feb 2011 15:14:45 -0800 (PST) Received: by 10.147.41.19 with HTTP; Wed, 9 Feb 2011 15:14:45 -0800 (PST) In-Reply-To: <00cd01cbc88b$4771f0d0$d655d270$@com> References: <00cd01cbc88b$4771f0d0$d655d270$@com> Date: Wed, 9 Feb 2011 18:14:45 -0500 Message-ID: Subject: Re: index size doubling / optimization (Lucene 3.0.3) From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 This is not expected. Did the last IW exit "gracefully"? If so, it should delete the old segments after swapping in the optimized one. Can you post infoStream output after running optimize? Mike On Wed, Feb 9, 2011 at 1:58 PM, Phil Herold wrote: > I know that the size of a Lucene index can double while optimization is > underway, but it's supposed to eventually settle back down to the original > size, correct? We have a Lucene index consisting of 100K documents, that is > normally about 12GB in size. It is split across 10 sub-indexes which we > search using MultiSearcher. It takes our system about 7 hours to traverse > the file system and update the index, which typically adds, updates or > deletes anywhere from a dozen to a few hundred documents. We optimize each > sub-index at the end (although this is configurable). The system seems to > run fine for several days, with the total size of the index staying fairly > consistent, then all of the sudden the index size doubles to about 25GB, and > stays there. I'm assuming this is happening after an optimization-there is > certainly not a doubling of the data that is being added. > > > > Is this expected or known behavior, or a bug of some kind? > > > > I've read various postings on the 'net regarding optimization, and when to > do it, if at all, and I'm certainly open to other strategies. Search time is > critical for our users. > > > > FWIW, we have the following tunable parameters configured for our index: > > > > mergeFactor: 5 > > maxMergeDocs: 1000 > > maxBufferedDocs: 200 > > RAMBufferSizeMB: 16 > > > > Any advice or help is appreciated. > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org