hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-164) In scripts that create large groups pig runs out of memory
Date Thu, 20 Mar 2008 16:35:24 GMT

     [ https://issues.apache.org/jira/browse/PIG-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates updated PIG-164:
---------------------------

    Attachment: PIG-164.patch

> In scripts that create large groups pig runs out of memory
> ----------------------------------------------------------
>
>                 Key: PIG-164
>                 URL: https://issues.apache.org/jira/browse/PIG-164
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: PIG-164.patch
>
>
> Scripts that need to group large amounts of data, such as a group all with 20m records,
often die with errors indicating that no more memory can be allocated.  PIG-40 addressed this
somewhat, but not completely.  In fact, it appears that in some situations it made it worse.
 If a script creates many data bags it can now run out of memory tracking all those data bags
that it may need to spill even if none of those bags gets very large.
> The issue is that the fix to PIG-40 introduced a memory manager that has a LinkedList
of WeakReferences that it uses to track these data bags.  When it is told by the memory manager
to dump memory, it walks this LinkedList, cleaning any entries that have gone stale and dumping
any that are still valid.  The problem is that in a script that processes many rows, the LinkedList
itself grows very large, and becomes the cause of needing to dump memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message