hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharad Agarwal <sha...@apache.org>
Subject Leak in RM Capacity scheduler leading to OOM
Date Wed, 23 Mar 2016 12:20:50 GMT
Taking a dump of 8 GB heap shows about 18 million

Similar counts are there for ApplicationAttempt, ContainerId. All seems to
be linked via
org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the count of
which is also about 18 million.

On further debugging, looking at the CapacityScheduler code:

It seems to add duplicated entries of UpdatedContainerInfo objects for the
completed containers. In the same dump seeing about 0.5
UpdatedContainerInfo million objects

This issue only surfaces if the scheduler thread is not able to drain fast
enough the UpdatedContainerInfo objects, happens only in a big cluster.

Has anyone noticed the same. We are running hadoop 2.6.0


View raw message