hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharad Agarwal <sha...@apache.org>
Subject Re: Leak in RM Capacity scheduler leading to OOM
Date Thu, 24 Mar 2016 00:57:40 GMT
Ticket for this is here ->
https://issues.apache.org/jira/browse/YARN-4852

On Wed, Mar 23, 2016 at 5:50 PM, Sharad Agarwal <sharad@apache.org> wrote:

> Taking a dump of 8 GB heap shows about 18 million
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto
>
> Similar counts are there for ApplicationAttempt, ContainerId. All seems to
> be linked via
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the count of
> which is also about 18 million.
>
> On further debugging, looking at the CapacityScheduler code:
>
> It seems to add duplicated entries of UpdatedContainerInfo objects for the
> completed containers. In the same dump seeing about 0.5
> UpdatedContainerInfo million objects
>
> This issue only surfaces if the scheduler thread is not able to drain fast
> enough the UpdatedContainerInfo objects, happens only in a big cluster.
>
> Has anyone noticed the same. We are running hadoop 2.6.0
>
> Sharad
>

Mime
View raw message