hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-793) Improving memory efficiency of Tuple implementation
Date Fri, 11 Sep 2009 00:21:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753916#action_12753916
] 

Olga Natkovich commented on PIG-793:
------------------------------------

Clarification from Alan on the String vs. Text comparison:

The 16/36 24/52 numbers noted in the bug are correct.  Let me explain them.  Text has a 16
byte overhead in and of itself, plus 16 bytes for the array that holds the data, plus 20 bytes
for the data.  String has a 24 byte overhead for itself, plus 12 bytes for whatever it holds
the data in, plus 40 bytes for the data.  So overall, I guess it would have been clearer had
I said Text has a 32 byte over head and String 36, and then Text stores the data in one byte
per characters (assumingASCII) while String stores it in 2 (ASCII or not).  There is some
guesswork involved here, since I'm just looking at output from Java memory tools.  We could
retest this with larger strings and make sure the results are consistent.


> Improving memory efficiency of Tuple implementation
> ---------------------------------------------------
>
>                 Key: PIG-793
>                 URL: https://issues.apache.org/jira/browse/PIG-793
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>
> Currently, our tuple is a real pig and uses a lot of extra memory. 
> There are several places where we can improve memory efficiency:
> (1) Laying out memory for the fields rather than using java objects since since each
object for a numeric field takes 16 bytes
> (2) For the cases where we know the schema using Java arrays rather than ArrayList.
> There might be more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message