lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
Date Thu, 15 Mar 2012 12:57:38 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-3867:
-------------------------------

    Attachment: LUCENE-3867.patch

Thanks Uwe !

I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling
any flag):

{code}
    [junit] NOTE: running test testReferenceSize
    [junit] NOTE: This JVM is 64bit: true
    [junit] NOTE: Reference size in this JVM: 8
{code}

* I modified the test name to testReferenceSize (was testCompressedOops).

I wrote this small test to print the differences between sizeOf(String) and estimateRamUsage(String):

{code}
  public void testSizeOfString() throws Exception {
    String s = "abcdefgkjdfkdsjdskljfdskfjdsf";
    String sub = s.substring(0, 4);
    System.out.println("original=" + RamUsageEstimator.sizeOf(s));
    System.out.println("sub=" + RamUsageEstimator.sizeOf(sub));
    System.out.println("checkInterned=true(orig): " + new RamUsageEstimator().estimateRamUsage(s));
    System.out.println("checkInterned=false(orig): " + new RamUsageEstimator(false).estimateRamUsage(s));
    System.out.println("checkInterned=false(sub): " + new RamUsageEstimator(false).estimateRamUsage(sub));
  }
{code}

It prints:
{code}
original=104
sub=56
checkInterned=true(orig): 0
checkInterned=false(orig): 98
checkInterned=false(sub): 98
{code}

So clearly estimateRamUsage factors in the sub-string's larger char[]. The difference in sizes
of 'orig' stem from AverageGuessMemoryModel which computes the reference size to be 4 (hardcoded),
and array size to be 16 (hardcoded). I modified AverageGuess to use constants from RUE (they
are best guesses themselves). Still the test prints a difference, but now I think it's because
sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed that in size(Object),
and now the prints are the same.

* I also fixed sizeOfArray -- if the array.length == 0, it returned 0, but it should return
its header, and aligned to mod 8 as well.

* I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size only. I started
to add sizeOf(String), fastSizeOf(String) and deepSizeOf(String[]), but reverted to avoid
the hassle -- the documentation confuses even me :).

* Changed all sizeOf() to return long, and align() to take and return long.

I think this is ready to commit, though I'd appreciate a second look on the MemoryModel and
size(Obj) changes.

Also, how about renaming MemoryModel methods to: arrayHeaderSize(), classHeaderSize(), objReferenceSize()
to make them more clear and accurate? For instance, getArraySize does not return the size
of an array, but its object header ...
                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
> -----------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER
+ NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included,
at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the usual object
header. However, this object head is 12 bytes to accommodate a four-byte array length. Then
comes the actual array data which, as you might expect, consists of the number of elements
multiplied by the number of bytes required for one element, depending on its type. The memory
usage for one element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including
such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some
room for improvement I'm sure, here it is:
> {code}
> 	/**
> 	 * Computes the approximate size of a String object. Note that if this object
> 	 * is also referenced by another object, you should add
> 	 * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> 	 * method.
> 	 */
> 	public static int sizeOf(String str) {
> 		return 2 * str.length() + 6 // chars + additional safeness for arrays alignment
> 				+ 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers
> 				+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array
> 				+ RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object
> 	}
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[]
... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message