hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
Date Thu, 24 Sep 2009 22:00:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759284#action_12759284
] 

Olga Natkovich commented on PIG-975:
------------------------------------

Couple of questions comments on the patch:

- Why do we need to synchronize in add. Who else is accessing the bag since it is no longer
managed by spillable manager?
- Memory fraction should be a java property so that users can control it they choose so
- Why do we have limit of only 100 tuples in memory since we already have memory limit? Also,
if we do need it, shouldn't it be configurable?

> Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-975
>                 URL: https://issues.apache.org/jira/browse/PIG-975
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Ying He
>            Assignee: Ying He
>             Fix For: 0.2.0
>
>         Attachments: PIG-975.patch, PIG-975.patch2
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is registered with
SpillableMemoryManager and prone to OutOfMemoryException.  It's better to pro-actively managers
the usage of the memory. The bag fills in memory to a specified amount, and dump the rest
the disk.  The amount of memory to hold tuples is configurable. This can avoid out of memory
error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message