hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying He (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
Date Thu, 24 Sep 2009 20:08:16 GMT

     [ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ying He updated PIG-975:
------------------------

    Description: POPackage uses DefaultDataBag during reduce process to hold data. It is registered
with SpillableMemoryManager and prone to OutOfMemoryException.  It's better to pro-actively
managers the usage of the memory. The bag fills in memory to a specified amount, and dump
the rest the disk.  The amount of memory to hold tuples is configurable. This can avoid out
of memory error.  (was: Currently whenever Combiner is used in pig, in the map, the POPrecombinerLocalRearrange
operator puts the single "value" tuple corresponding to a key into a DataBag and passes this
to the foreach which is being combined. This will generate as many bags as there are input
records. These bags all will have a single tuple and hence are small and should not need to
be spilt to disk. However since the bags are created through the BagFactory mechanism, each
bag creation is registered with the SpillableMemoryManager and a weak reference to the bag
is stored in a linked list. This linked list grows really big over time causing unnecessary
Garbage collection runs. This can be avoided by having a simple lightweight implementation
of the DataBag interface to store the single tuple in a bag. Also these SingleTupleBags should
be created without registering with the spillableMemoryManager. Likewise the bags created
in POCombinePackage are supposed to fit in Memory and not spill. Again a NonSpillableDataBag
implementation of DataBag interface which does not register with the SpillableMemoryManager
would help.
)

> Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-975
>                 URL: https://issues.apache.org/jira/browse/PIG-975
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Ying He
>            Assignee: Pradeep Kamath
>             Fix For: 0.2.0
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is registered with
SpillableMemoryManager and prone to OutOfMemoryException.  It's better to pro-actively managers
the usage of the memory. The bag fills in memory to a specified amount, and dump the rest
the disk.  The amount of memory to hold tuples is configurable. This can avoid out of memory
error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message