hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying He (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
Date Thu, 24 Sep 2009 20:05:16 GMT
Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
---------------------------------------------------------------------------------------------

                 Key: PIG-975
                 URL: https://issues.apache.org/jira/browse/PIG-975
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.2.0
            Reporter: Ying He
            Assignee: Pradeep Kamath
             Fix For: 0.2.0


Currently whenever Combiner is used in pig, in the map, the POPrecombinerLocalRearrange operator
puts the single "value" tuple corresponding to a key into a DataBag and passes this to the
foreach which is being combined. This will generate as many bags as there are input records.
These bags all will have a single tuple and hence are small and should not need to be spilt
to disk. However since the bags are created through the BagFactory mechanism, each bag creation
is registered with the SpillableMemoryManager and a weak reference to the bag is stored in
a linked list. This linked list grows really big over time causing unnecessary Garbage collection
runs. This can be avoided by having a simple lightweight implementation of the DataBag interface
to store the single tuple in a bag. Also these SingleTupleBags should be created without registering
with the spillableMemoryManager. Likewise the bags created in POCombinePackage are supposed
to fit in Memory and not spill. Again a NonSpillableDataBag implementation of DataBag interface
which does not register with the SpillableMemoryManager would help.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message