lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
Date Sat, 17 Mar 2012 22:08:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232079#comment-13232079
] 

Dawid Weiss commented on LUCENE-3867:
-------------------------------------

I've played with the code a bit and I've been trying to figure out a way to determine empirically
"how far off" is the estimation from real life usage. It's not easy because RUE itself allocates
memory (and not small quantities in case of complex object graphs!). I left these experiments
in StressRamUsageEstimator; it is a test case -- maybe we should add @Ignore and rename it
to Test*, don't know.

Anyway, the allocation seems to be measured pretty accurately. When tlabs are disabled this
is a result of allocating small byte arrays for example:
{noformat}
 committed           max        estimated(allocation)
      2 MB	   48.4 MB	  16 bytes
    1.7 MB	   48.4 MB	  262.4 KB
      2 MB	   48.4 MB	  524.6 KB
    2.2 MB	   48.4 MB	    787 KB
    2.5 MB	   48.4 MB	      1 MB
    2.7 MB	   48.4 MB	    1.3 MB
      3 MB	   48.4 MB	    1.5 MB
    3.3 MB	   48.4 MB	    1.8 MB
....
   46.9 MB	   48.4 MB	   45.6 MB
   47.1 MB	   48.4 MB	   45.9 MB
   47.4 MB	   48.4 MB	   46.1 MB
   47.6 MB	   48.4 MB	   46.4 MB
   47.9 MB	   48.4 MB	   46.6 MB
   48.1 MB	   48.4 MB	   46.9 MB
{noformat}

So it's fairly ideal (committed memory is all committed memory so I assume additional data
structures, classes, etc. also count in).

Unfortunately it's not always so smooth, for example jrockit's mx beans seem not to return
the actual memory allocation state (and if they do, I don't understand it):
{noformat}
 committed           max        estimated(allocation)
   29.4 MB	     50 MB	  16 bytes
   29.8 MB	     50 MB	  262.5 KB
   30.2 MB	     50 MB	  524.9 KB
   30.4 MB	     50 MB	  787.3 KB
   30.8 MB	     50 MB	      1 MB
   31.1 MB	     50 MB	    1.3 MB
   31.4 MB	     50 MB	    1.5 MB
   31.7 MB	     50 MB	    1.8 MB
     32 MB	     50 MB	      2 MB
   32.4 MB	     50 MB	    2.3 MB
   32.7 MB	     50 MB	    2.6 MB
   33.1 MB	     50 MB	    2.8 MB
   33.5 MB	     50 MB	    3.1 MB
   33.8 MB	     50 MB	    3.3 MB
   34.2 MB	     50 MB	    3.6 MB
   34.5 MB	     50 MB	    3.8 MB
   34.8 MB	     50 MB	    4.1 MB
   35.2 MB	     50 MB	    4.4 MB
   35.5 MB	     50 MB	    4.6 MB
   35.7 MB	     50 MB	    4.9 MB
   36.2 MB	     50 MB	    5.1 MB
   36.4 MB	     50 MB	    5.4 MB
...
   49.6 MB	     50 MB	   47.6 MB
     50 MB	     50 MB	   47.9 MB
   49.6 MB	     50 MB	   48.2 MB
   49.9 MB	     50 MB	   48.4 MB
{noformat}

A snapshot from 32 bit HotSpot:
{noformat}
...
   25.5 MB	   48.4 MB	   24.7 MB
   25.7 MB	   48.4 MB	   24.9 MB
   25.9 MB	   48.4 MB	   25.1 MB
   26.1 MB	   48.4 MB	   25.3 MB
   26.3 MB	   48.4 MB	   25.5 MB
   26.5 MB	   48.4 MB	   25.7 MB
   26.7 MB	   48.4 MB	   25.9 MB
   26.8 MB	   48.4 MB	   26.1 MB
     27 MB	   48.4 MB	   26.4 MB
   27.2 MB	   48.4 MB	   26.6 MB
   27.4 MB	   48.4 MB	   26.8 MB
   27.7 MB	   48.4 MB	     27 MB
...
{noformat}

I see two problems that remain, but I don't think they're urgent enough to be addressed now:
- the stack easily overflows if the graph of objects has long chains. This is demonstrated
in the test case (uncomment ignore annotation).
- there is a fair amount of memory allocation going on in the RUE itself. If one _knows_ the
graph of an object's dependencies is a tree then the memory cost could be decreased to zero
(because we wouldn't need to remember which objects we've seen so far).
- we could make RUE an object again (resign from static methods) and have a cache of classes
and class-fields to avoid reflective accesses over and over. If one performed estimations
over and over then such a  RUE instance would have an initial cost, but then would be running
smoother.

Having said that, I'm +1 for committing this in if you agree with the changes I've made (I
will be a pain in the arse about that naming convention discriminating between shallow vs.
deep sizeOf though :).
                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER
+ NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included,
at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the usual object
header. However, this object head is 12 bytes to accommodate a four-byte array length. Then
comes the actual array data which, as you might expect, consists of the number of elements
multiplied by the number of bytes required for one element, depending on its type. The memory
usage for one element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including
such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some
room for improvement I'm sure, here it is:
> {code}
> 	/**
> 	 * Computes the approximate size of a String object. Note that if this object
> 	 * is also referenced by another object, you should add
> 	 * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> 	 * method.
> 	 */
> 	public static int sizeOf(String str) {
> 		return 2 * str.length() + 6 // chars + additional safeness for arrays alignment
> 				+ 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers
> 				+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array
> 				+ RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object
> 	}
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[]
... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message