lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (LUCENE-1408) DocumentsWriter.init() doesn't grow fieldDataHash array at same rate as allFieldData array, leading to OOM errors
Date Sun, 12 Oct 2008 18:16:44 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-1408.
----------------------------------------

    Resolution: Won't Fix

Won't fix on 2.3, since it only happens with very very large number of fields, and, it's fixed
in 2.4.

> DocumentsWriter.init() doesn't grow fieldDataHash array at same rate as allFieldData
array, leading to OOM errors
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1408
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1408
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3.2
>         Environment: NA
>            Reporter: David C. Navas
>            Priority: Minor
>
> See DocumentsWriter.init() -- line 787ish
> When a new field is encountered, and arrays need to be resized, the allFieldDataArray
is resized to be 50% larger, and the hashArray is resized to be twice as large.  Everytime.
 The hashArray grows much faster than the fieldData array.
> In addition, the fieldDataHashMask is set to be one less than the *fieldDataArray* size,
rather than the hashArray.
> The latter problem obviously leads to under/bizarre utilization of the hash array, while
the former can, under circumstances where you are using an excessive number of field columns,
lead to premature OOMs (30k field columns is something like 30 million entry placeholders
in the hash array, or about 120M per ThreadState).
> Trivial fix for both would be to change *1.5 to *2, and reset the Mask based on newHashSize,
not newSize.  Given you are using a mask, it looks like you want a power of two, so you can't
use *1.5 everywhere, but you could resize the hash only when needed, rather than each time
you resize the data array, though that would be somewhat more difficult.
> I made this Minor as it only affects extreme field use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message