Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 5265 invoked from network); 9 Sep 2009 09:22:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Sep 2009 09:22:23 -0000 Received: (qmail 95609 invoked by uid 500); 9 Sep 2009 09:22:22 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 95549 invoked by uid 500); 9 Sep 2009 09:22:22 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 95541 invoked by uid 99); 9 Sep 2009 09:22:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Sep 2009 09:22:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Sep 2009 09:22:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 817C5234C044 for ; Wed, 9 Sep 2009 02:21:57 -0700 (PDT) Message-ID: <1955864336.1252488117516.JavaMail.jira@brutus> Date: Wed, 9 Sep 2009 02:21:57 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1899) Inefficient growth of OpenBitSet In-Reply-To: <549628201.1252416777873.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752994#action_12752994 ] Michael McCandless commented on LUCENE-1899: -------------------------------------------- bq. This formula ensures that at worst case, just 6% of the array space is wasted You mean 12.5% right? bq. I just wonder what is the rationale behind the specific formula in this function It's just a standard time/space tradeoff, that favors not wasting too much space. This code was "borrowed" from Python's "listobject.c" sources, ie, it governs how Python over-allocates the storage for its list type. We could explore different constants though I'd be nervous about making this value much higher. Often the consumer of this API will see rapid growth initially, and then the collection stops growing and is re-used for a long time, in which case the long-term wasted RAM is (I think) more important than the one-time short-term CPU cost of finding the "natural" size. > Inefficient growth of OpenBitSet > -------------------------------- > > Key: LUCENE-1899 > URL: https://issues.apache.org/jira/browse/LUCENE-1899 > Project: Lucene - Java > Issue Type: Bug > Components: Store > Affects Versions: 2.9 > Reporter: Nadav Har'El > Priority: Minor > > Hi, I found a potentially serious efficiency problem with OpenBitSet. > One typical (I think) way to build a bit set is to set() the bits one by one - > e.g., have a HitCollector set() the bit for each matching document. > The underlying array of longs needs to grow as more as more bits are set, of > course. > But looking at the code, it appears to me that the array grows very > ineefficiently - in the worst case (when doc ids are sorted, as they would > normally be in the HitCollector case for example), copying the array again > and again for every added bit... The relevant code in OpenBitSet.java is: > public void set(long index) { > int wordNum = expandingWordNum(index); > ... > } > protected int expandingWordNum(long index) { > int wordNum = (int)(index >> 6); > if (wordNum>=wlen) { > ensureCapacity(index+1); > ... > } > public void ensureCapacityWords(int numWords) { > if (bits.length < numWords) { > long[] newBits = new long[numWords]; > System.arraycopy(bits,0,newBits,0,wlen); > bits = newBits; > } > } > As you can see, if the bits array is not long enough, a new one is > allocated at exactly the right size - and in the worst case it can grow > just one word every time... > Shouldn't the growth be more exponential in nature, e.g., grow to the maximum > of index+1 and twice the existing size? > Alternatively, if the growth is so inefficient, this should be documented, > and it should be recommended to use the variant of the constructor with the > correct initial size (e.g., in the HitCollector case, the number of documents > in the index). and the fastSet() method instead of set(). > Thanks, > Nadav. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org