hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1492) DefaultTuple and DefaultMemory understimate their memory footprint
Date Thu, 15 Jul 2010 03:13:50 GMT

     [ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated PIG-1492:
-------------------------------

    Attachment: PIG-1492.1.patch

This patch updates the memory size calculations . This changes were made so that the estimated
sizes are closer to what is seen in 32 bit Java HotSpot(TM) Server VM (build 10.0-b19, mixed
mode). 

It is based on some of the observations in http://www.javamex.com/tutorials/memory/string_memory_usage.shtm
.  The header sizes of objects has been taken to be 8 bytes. The objects size is rounded to
multiple of 8 bytes. Some other adjustments for minimum size of array in a ArrayList were
made based on observed size values.

The follow tables shows the tuple estimated sizes before/after the patch and what is actually
observed, for the types whose calculation logic changed -

|| type || num of columns of this type in the tuple || before || patched || observed ||
| BYTEARRAY with 5 bytes| 10|254 | 504|495 |
| BYTEARRAY with 5 bytes| 1000| 21044| 44064|44127 |
| DOUBLE| 10|364 | 264|255 |
| DOUBLE| 1000|32044 | 20064| 20127 |
| LONG |10 | 284|264 |255 |
| LONG | 1000 | 24044 | 20064 | 20127 |




|| Tuple containing a single - || patched || observed ||
|  BAG with 10 empty tuples|524| 1092|1159 |
|  BAG with 1000 empty tuples| 48044| 100092| 100211|
|  map with 10 integer key-value pairs| 380| 824| 775|
|  map with 1000 integer key-value pairs| 32060| 64184| 64346|

> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
>                 Key: PIG-1492
>                 URL: https://issues.apache.org/jira/browse/PIG-1492
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example,
for map datatypes, we don't account for the per entry cost for the map container data structures.
The estimated size of a tuple having map with 100 integer key-value entries , as per current
version of code is 3260 bytes, while what is observed is around 6775 bytes .  To verify the
memory footprint, i checked free memory before and after creating multiple instances of the
object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .

> In PIG-1443 similar change was done to fix this for CHARARRAY .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message