pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4043) JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks
Date Sat, 28 Jun 2014 22:48:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046993#comment-14046993
] 

Cheolsoo Park commented on PIG-4043:
------------------------------------

{quote}
I think the OOM is because there are two huge arrays during the same time unlike Hadoop 1.x
HadoopShims.
{quote}
This isn't true. In fact, I am seeing OOM in 0.12 that doesn't include the code you're referring
to (introduced by PIG-3913). In 0.12, there are no two copies of TaskReport arrays. If you
look at the heap dump, it is a single array object that is as big as 800MB.

In addition, I see the same issue in Lipstick, for example, [here|https://github.com/Netflix/Lipstick/blob/master/lipstick-console/src/main/java/com/netflix/lipstick/pigtolipstick/BasicP2LClient.java#L414].
The Pig dies as soon as calling {{JobClient.getTaskMapReports()}}. I've been running several
tests so far. It's clear that I cannot run my job (100K mappers) with any {{JobClient.getTaskMapReports()}}
call in both Pig and Lipstick in Hadoop 2.4.

Unless {{JobClient.getTaskMapReports()}} itself returns an iterator, we need a way of disabling
it.

> JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-4043
>                 URL: https://issues.apache.org/jira/browse/PIG-4043
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4043-1.patch, heapdump.png
>
>
> With Hadoop 2.4, I often see Pig client fails due to OOM when there are many tasks (~100K)
with 1GB heap size.
> The heap dump (attached) shows that TaskReport[] occupies about 80% of heap space at
the time of OOM.
> The problem is that JobClient.getMap/ReduceTaskReports() returns an array of TaskReport
objects, which can be huge if the number of task is large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message