lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
Date Fri, 23 Mar 2012 21:49:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237172#comment-13237172
] 

Dawid Weiss commented on LUCENE-3867:
-------------------------------------

I've been thinking how one can assess the estimation quality of the new code. I cam up with
this:
- I allocate an Object[] half the size of estimated maximum available RAM (just to make sure
all objects will fit without the need to reallocate),
- I precompute shallow sizes for instances of all "wild classes" (classes with random fields,
including arrays).
- I then fill in the "vault" array above with random instances of wild classes, summing up
the estimated size UNTIL I HIT OOM.
- Once I git OOM I know how much we actually allocated vs. how much space we thought we did
allocate.

The results are very accurate on HotSpot if one is using serial GC. For example:
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems
Inc., 1.6.0_29]
Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05% (113.6 KB)
{noformat}

If one runs with a parallel GC things do get out of hand because the GC is not keeping up
with allocations (although I'm not sure how I should interpret this because we only allocate;
it's not possible to free any space -- maybe there are different GC pools or something):
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems
Inc., 1.6.0_29]
Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34% (47.3 MB)
{noformat}

JRockit:
{noformat}
[JVM: Oracle JRockit(R), R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation,
Oracle Corporation, 1.6.0_26]
Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8 MB)
{noformat}

I think we're good. If somebody wishes to experiment, the spike is here:
https://github.com/dweiss/java-sizeof
{noformat}
mvn test
mvn dependency:copy-dependencies
java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
  com.carrotsearch.sizeof.TestEstimationQuality
{noformat}
                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Uwe Schindler
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER
+ NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included,
at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the usual object
header. However, this object head is 12 bytes to accommodate a four-byte array length. Then
comes the actual array data which, as you might expect, consists of the number of elements
multiplied by the number of bytes required for one element, depending on its type. The memory
usage for one element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including
such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some
room for improvement I'm sure, here it is:
> {code}
> 	/**
> 	 * Computes the approximate size of a String object. Note that if this object
> 	 * is also referenced by another object, you should add
> 	 * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> 	 * method.
> 	 */
> 	public static int sizeOf(String str) {
> 		return 2 * str.length() + 6 // chars + additional safeness for arrays alignment
> 				+ 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers
> 				+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array
> 				+ RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object
> 	}
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[]
... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message