hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7163) RM crashes with OOM in secured cluster when HA is enabled
Date Wed, 06 Sep 2017 14:42:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155457#comment-16155457

Rohith Sharma K S commented on YARN-7163:

The primary reason for this is RMAuthenticationFilter is added insecurity cluster which has
the reference to RMDelegationTokenSecretManager which interns reference to RMContext. Once
the filter is initialized with RMDelegationTokenSecretManager, then it wont be changing during
lifetime of RM jvm.  During RM HA switch i.e ACTIVE -> STANDBY RMContext is recreated and
initialized with new secret manager. But this new secret manager won't be updating into RMAuthenticationFilter.

As a result, old RMcontext reference is won't be GC ed. Still remain in old generation heap
space forever! So, when RM switches to standby, still its heap space remain almost same as
active state. Later during switching back to active, it try to recover from state store which
increment the heap size double i.e existing_heap_size+loaded_rmappState_from_statestore!

> RM crashes with OOM in secured cluster when HA is enabled
> ---------------------------------------------------------
>                 Key: YARN-7163
>                 URL: https://issues.apache.org/jira/browse/YARN-7163
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
> It is observed that RM crashes with heap space OOM in secure cluster(http authentication
is kerborse) when RM HA is enabled. 
> Scenario is 
> 1. Start RM in HA secure mode. Lets say RM1 is active mode.
> 2. Run many applications so that it uses greater than 50% of heap space configured. Lets
say, if heap space is 2GB, then run applications that occupy 1.5GB of heap space. 
> 3. Switch RM to StandBy and bring back to Active! While recovering applications from
state store, RM crashes with OOM. 
> *Note* : This issue will happen only when RM is started as ACTIVE directly. (not switched
from standby to active during start of JVM)
> Heap dump shows that RMAuthenticationFilter holds 60% heap space! And other 40% held
by RMAppState which is during recovering from state store. This exceeds the heap space and
crashes with OOM. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message