pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1875) Keep tuples serialized to limit spilling and speed it when it happens
Date Tue, 01 Mar 2011 23:28:37 GMT
Keep tuples serialized to limit spilling and speed it when it happens
---------------------------------------------------------------------

                 Key: PIG-1875
                 URL: https://issues.apache.org/jira/browse/PIG-1875
             Project: Pig
          Issue Type: Improvement
          Components: impl
            Reporter: Alan Gates
            Priority: Minor


Currently Pig reads records off of the reduce iterator and immediately deserializes them into
Java objects.  This takes up much more memory than serialized versions, thus Pig spills sooner
then if it stored them in serialized form.  Also, if it does have to spill, it has to serialize
them again, and then again deserialize them after reading from the spill file.

We should explore storing them in memory serialized when they are read off of the reduce iterator.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message