From java-dev-return-27820-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Wed Oct 01 10:26:13 2008 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 68513 invoked from network); 1 Oct 2008 10:26:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Oct 2008 10:26:13 -0000 Received: (qmail 15736 invoked by uid 500); 1 Oct 2008 10:26:07 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 15297 invoked by uid 500); 1 Oct 2008 10:26:04 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 15273 invoked by uid 99); 1 Oct 2008 10:26:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 03:26:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 10:25:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BFF63234C1F9 for ; Wed, 1 Oct 2008 03:25:44 -0700 (PDT) Message-ID: <257022394.1222856744785.JavaMail.jira@brutus> Date: Wed, 1 Oct 2008 03:25:44 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1408) DocumentsWriter.init() doesn't grow fieldDataHash array at same rate as allFieldData array, leading to OOM errors In-Reply-To: <21827323.1222819544233.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635991#action_12635991 ] Michael McCandless commented on LUCENE-1408: -------------------------------------------- Whoa, good catch! We are growing the hash too quickly, and, under utilizing it. This is actually already fixed in 2.4 -- DocumentsWriter.java was refactored into an [internal only] indexing chain. This code moved to DocFieldProcessorPerThread.java: https://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/DocFieldProcessorPerThread.java With the refactoring we no longer have 2 arrays (we just have the hash) and the hash "properly" doubles its size and sets mask = size-1 when it needs to grow. Can you look at the code above and see if it looks right? If so, I'm leaning towards resolving this as WONTFIX (on 2.3.x) since it's already fixed in 2.4. > DocumentsWriter.init() doesn't grow fieldDataHash array at same rate as allFieldData array, leading to OOM errors > ----------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-1408 > URL: https://issues.apache.org/jira/browse/LUCENE-1408 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.3.2 > Environment: NA > Reporter: David C. Navas > Priority: Minor > > See DocumentsWriter.init() -- line 787ish > When a new field is encountered, and arrays need to be resized, the allFieldDataArray is resized to be 50% larger, and the hashArray is resized to be twice as large. Everytime. The hashArray grows much faster than the fieldData array. > In addition, the fieldDataHashMask is set to be one less than the *fieldDataArray* size, rather than the hashArray. > The latter problem obviously leads to under/bizarre utilization of the hash array, while the former can, under circumstances where you are using an excessive number of field columns, lead to premature OOMs (30k field columns is something like 30 million entry placeholders in the hash array, or about 120M per ThreadState). > Trivial fix for both would be to change *1.5 to *2, and reset the Mask based on newHashSize, not newSize. Given you are using a mask, it looks like you want a power of two, so you can't use *1.5 everywhere, but you could resize the hash only when needed, rather than each time you resize the data array, though that would be somewhat more difficult. > I made this Minor as it only affects extreme field use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org