Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 88697 invoked from network); 22 Nov 2006 18:33:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Nov 2006 18:33:36 -0000 Received: (qmail 18309 invoked by uid 500); 22 Nov 2006 18:33:40 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 18270 invoked by uid 500); 22 Nov 2006 18:33:40 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 18252 invoked by uid 99); 22 Nov 2006 18:33:40 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Nov 2006 10:33:40 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of ning.li.li@gmail.com designates 66.249.82.234 as permitted sender) Received: from [66.249.82.234] (HELO wx-out-0506.google.com) (66.249.82.234) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Nov 2006 10:33:27 -0800 Received: by wx-out-0506.google.com with SMTP id i29so285694wxd for ; Wed, 22 Nov 2006 10:33:06 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=pe2DpjJojPL4PCdqAbOt8OWVxG7nXij1GWIMXG8ZwFadTsCbOXME+ojWw6Jf1HS6VS07Zmy2U+a+Ue7zGCbwrBmg48txYzqvhkKypvceHAMTV8Ft1Wn/p9eBkfGAMkVl6GT++TiYGoJQ5lCNSXl0t+J0QRJx/TY/1oUN8bW5Y4s= Received: by 10.90.30.10 with SMTP id d10mr7006221agd.1164220386340; Wed, 22 Nov 2006 10:33:06 -0800 (PST) Received: by 10.90.31.12 with HTTP; Wed, 22 Nov 2006 10:33:05 -0800 (PST) Message-ID: Date: Wed, 22 Nov 2006 13:33:06 -0500 From: "Ning Li" To: java-dev@lucene.apache.org Subject: Re: [jira] Resolved: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size In-Reply-To: <22186323.1164163923262.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <26596647.1163138077092.JavaMail.jira@brutus> <22186323.1164163923262.JavaMail.jira@brutus> X-Virus-Checked: Checked by ClamAV on apache.org I was away so I'm catching up. If this (occasional large documents consume too much memory) happens to a few applications, should it be solved in IndexWriter? A possible design could be: First, in addDocument(), compute the byte size of a ram segment after the ram segment is created. In the synchronized block, when the newly created segment is added to ramSegmentInfos, also add its byte size to the total byte size of ram segments. Then, in maybeFlushRamSegments(), either one of two conditions can trigger a flush: number of ram segments reaching maxBufferedDocs, and total byte size of ram segments exceeding a threshold. The overhead is very small in this design. Of course, IndexWriter would have another configurable parameter. :-) But it's nice if an application could set a limit on the memory it uses to buffer docs. Ning On 11/21/06, Yonik Seeley (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-709?page=all ] > > Yonik Seeley resolved LUCENE-709. > --------------------------------- > > Resolution: Fixed > > Committed. Thanks for bearing with me though this Chuck! > > > [PATCH] Enable application-level management of IndexWriter.ramDirectory size > > ---------------------------------------------------------------------------- > > > > Key: LUCENE-709 > > URL: http://issues.apache.org/jira/browse/LUCENE-709 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Index > > Affects Versions: 2.0.1 > > Environment: All > > Reporter: Chuck Williams > > Attachments: ramdir.patch, ramdir.patch, ramDirSizeManagement.patch, ramDirSizeManagement.patch, ramDirSizeManagement.patch, ramDirSizeManagement.patch > > > > > > IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs, which limits it to a fixed number of documents. When document sizes vary substantially, especially when documents cannot be truncated, this leads either to inefficiencies from a too-small value or OutOfMemoryErrors from a too large value. > > This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size information about IndexWriter.ramDirectory so that an application can manage this based on total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller documents or a smaller number of larger documents. This can lead to much better performance while elimianting the possibility of OutOfMemoryErrors. > > The actual job of managing to a size constraint, or any other constraint, is left up the applicatation. > > The addition of synchronized to flushRamSegments() is only for safety of an external call. It has no significant effect on internal calls since they all come from a sychronized caller. > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa > - > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org