hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying He (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
Date Thu, 24 Sep 2009 22:48:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759299#action_12759299
] 

Ying He commented on PIG-975:
-----------------------------

Answer to Olga's questions:

1. The synchronization can be removed. 
2. Memory fraction is configurable. the property name is pig.cachedbag.memusage, default value
is 0.5
3. The first 100 tuples are used to calculate tuple size in memory to determine how many tuples
can fit into the configured memusage. It's not the number of tuples kept in memory

> Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-975
>                 URL: https://issues.apache.org/jira/browse/PIG-975
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Ying He
>            Assignee: Ying He
>             Fix For: 0.2.0
>
>         Attachments: PIG-975.patch, PIG-975.patch2
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is registered with
SpillableMemoryManager and prone to OutOfMemoryException.  It's better to pro-actively managers
the usage of the memory. The bag fills in memory to a specified amount, and dump the rest
the disk.  The amount of memory to hold tuples is configurable. This can avoid out of memory
error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message