Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 18239 invoked from network); 25 Jul 2007 18:29:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Jul 2007 18:29:55 -0000 Received: (qmail 85126 invoked by uid 500); 25 Jul 2007 18:29:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 84956 invoked by uid 500); 25 Jul 2007 18:29:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 84945 invoked by uid 99); 25 Jul 2007 18:29:48 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jul 2007 11:29:48 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [64.81.62.48] (HELO mail01.apmindsf.com) (64.81.62.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jul 2007 11:29:47 -0700 Received: from [172.30.250.102] (helo=[172.30.250.102]) by mail01.apmindsf.com with esmtp (Exim 4.50) id 1IDlbx-0001Zx-1Q for java-user@lucene.apache.org; Wed, 25 Jul 2007 11:29:25 -0700 Message-ID: <46A7995D.5030004@metaweb.com> Date: Wed, 25 Jul 2007 11:41:33 -0700 From: Tim Sturge User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050715) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: java gc with a frequently changing index? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am indexing a set of constantly changing documents. The change rate is moderate (about 10 docs/sec over a 10M document collection with a 6G total size) but I want to be right up to date (ideally within a second but within 5 seconds is acceptable) with the index. Right now I have code that adds new documents to the index and deletes old ones using updateDocument() in the 2.1 IndexWriter. In order to see the changes, I need to recreate the IndexReader/IndexSearcher every second or so. I am not calling optimize() on the index in the writer, and the mergeFactor is 10. The problem I am facing is that java gc is terrible at collecting the IndexSearchers I am discarding. I usually have a 3msec query time, but I get gc pauses of 300msec to 3 sec (I assume is is collecting the "tenured" generation in these pauses, which is my old IndexSearcher) I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and calling System.gc() right after I close the old index without much luck (I get the pauses down to 1sec, but get 3x as many. I want < 25 msec pauses). So my question is, should I be avoiding reloading my index in this way? Should I keep a separate IndexReader (which only deletes old documents) and one for new documents? Is there a standard technique for a quickly changing index? Thanks, Tim --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org