pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known
Date Thu, 08 Dec 2011 06:23:40 GMT

    [ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165022#comment-13165022
] 

Dmitriy V. Ryaboy commented on PIG-2359:
----------------------------------------

Found a place where this breaks. InternalCachedBag (and presumably other cached bags) use
<code>t.write(out)<code> to spill to disk, and <code>t = factory.newTuple();
t.readFields(in)<code> to read. This is a problem as it assumes t will write itself
in a format the default tuple returned by factory.newTuple() will read. Seems like a straightforward
fix would be to use InterSedes to read, right? Any reason that wouldn't work?
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible to avoid
this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message