Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98841 invoked from network); 1 Dec 2006 13:12:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Dec 2006 13:12:08 -0000 Received: (qmail 10285 invoked by uid 500); 1 Dec 2006 13:12:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10258 invoked by uid 500); 1 Dec 2006 13:12:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10247 invoked by uid 99); 1 Dec 2006 13:12:09 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Dec 2006 05:12:09 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [62.213.161.130] (HELO redhat.sirma.bg) (62.213.161.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Dec 2006 05:11:57 -0800 Received: from [192.168.128.211] (stenly.sirma.int [192.168.128.211]) by redhat.sirma.bg (8.12.7/8.12.7/Sirma Linux 0.6) with ESMTP id kB1DBYER020773 for ; Fri, 1 Dec 2006 15:11:34 +0200 Message-ID: <45702A05.2000105@sirma.bg> Date: Fri, 01 Dec 2006 15:11:33 +0200 From: Stanislav Jordanov User-Agent: Thunderbird 1.4 (Windows/20050908) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: an alternative to optimize? References: <1164969113.7354.12.camel@nils-laptop> In-Reply-To: <1164969113.7354.12.camel@nils-laptop> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by Sirma Antivirus System X-Virus-Checked: Checked by ClamAV on apache.org Guys, I've already asked this question but nobody answered: Suppose we have a relatively big index which is continuously updated - i.e. new docs get added while some of the old docs get deleted. For pragmatic reasons we have a restriction on maxMergeDocs so that segment files don't get enormously big. Consider now a segment of max size (i.e. containing maxMergeDocs docs hence not eligible for a merge) It is possible that (as time passes) this segment will have more and more of its docs deleted. But as it is not merge-able it will remain the same size and with lots of "wholes" in it which is bad for performance. The only way that I am aware of to correct this problem is to invoke index optimization, which has several drawbacks: 1. it takes a while to optimize a big index. 2. the optimization process always produces a index comprising of a single (extremely) large segment. We can live with 1. But 2 is undesirable. Is there a way to "optimize" (in terms of purging its deleted docs) an index or a single segment without ending up with a single segment index? Best, Stanislav --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org