pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1875) Keep tuples serialized to limit spilling and speed it when it happens
Date Wed, 02 Mar 2011 01:34:36 GMT

     [ https://issues.apache.org/jira/browse/PIG-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1875:
--------------------------------

    Fix Version/s: 0.10

> Keep tuples serialized to limit spilling and speed it when it happens
> ---------------------------------------------------------------------
>
>                 Key: PIG-1875
>                 URL: https://issues.apache.org/jira/browse/PIG-1875
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Alan Gates
>            Priority: Minor
>             Fix For: 0.10
>
>         Attachments: mrtuple.patch
>
>
> Currently Pig reads records off of the reduce iterator and immediately deserializes them
into Java objects.  This takes up much more memory than serialized versions, thus Pig spills
sooner then if it stored them in serialized form.  Also, if it does have to spill, it has
to serialize them again, and then again deserialize them after reading from the spill file.
> We should explore storing them in memory serialized when they are read off of the reduce
iterator.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message