hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-793) Improving memory efficiency of Tuple implementation
Date Sat, 12 Sep 2009 07:08:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754491#action_12754491
] 

Ashutosh Chauhan commented on PIG-793:
--------------------------------------

In addition to String Vs Text, Alan also mentioned using array instead of ArrayList<Object>.
Did any took a look at that? I think that change should also help. When I benchmarked merge
join, nearly 20-30% CPU time was spent in arraylist's operations, which should benefit a lot
if an array is used instead. So, changing to arrays should help both in memory and CPU runtime
at the cost of expensive appends.

Also, some small benefits can be gained by very simple changes introduced in https://issues.apache.org/jira/browse/PIG-513

> Improving memory efficiency of Tuple implementation
> ---------------------------------------------------
>
>                 Key: PIG-793
>                 URL: https://issues.apache.org/jira/browse/PIG-793
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>
> Currently, our tuple is a real pig and uses a lot of extra memory. 
> There are several places where we can improve memory efficiency:
> (1) Laying out memory for the fields rather than using java objects since since each
object for a numeric field takes 16 bytes
> (2) For the cases where we know the schema using Java arrays rather than ArrayList.
> There might be more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message