hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohith Sharma K S <rohithsharm...@huawei.com>
Subject RE: Leak in RM Capacity scheduler leading to OOM
Date Thu, 24 Mar 2016 02:13:05 GMT
I think you might be hitting with YARN-2997. This issue fixes for sending duplicated completed
containers to RM.

Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Sharad Agarwal [mailto:sharad@apache.org] 
Sent: 24 March 2016 08:58
To: Sharad Agarwal
Cc: yarn-dev@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Leak in RM Capacity scheduler leading to OOM

Ticket for this is here ->
https://issues.apache.org/jira/browse/YARN-4852

On Wed, Mar 23, 2016 at 5:50 PM, Sharad Agarwal <sharad@apache.org> wrote:

> Taking a dump of 8 GB heap shows about 18 million 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto
>
> Similar counts are there for ApplicationAttempt, ContainerId. All 
> seems to be linked via 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the 
> count of which is also about 18 million.
>
> On further debugging, looking at the CapacityScheduler code:
>
> It seems to add duplicated entries of UpdatedContainerInfo objects for 
> the completed containers. In the same dump seeing about 0.5 
> UpdatedContainerInfo million objects
>
> This issue only surfaces if the scheduler thread is not able to drain 
> fast enough the UpdatedContainerInfo objects, happens only in a big cluster.
>
> Has anyone noticed the same. We are running hadoop 2.6.0
>
> Sharad
>
Mime
View raw message