hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
Date Wed, 14 Aug 2013 18:57:47 GMT

    [ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740048#comment-13740048
] 

Omkar Vinit Joshi commented on YARN-1061:
-----------------------------------------

How can NM wait infinitely? I mean what is your connection timeout set to? can you add below
parameters to your log4j.properties and see if actually times out or wait infinitely for RM...
Also can attach those logs once you simulate it?
{code}
log4j.logger.org.apache.hadoop.ipc.Server=DEBUG
log4j.logger.org.apache.hadoop.ipc.Client=DEBUG
{code}

Also helpful configurations from *CommonConfigurationKeysPublic*
{code}
  public static final String  IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY =
    "ipc.client.connection.maxidletime";
  /** Default value for IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY */
  public static final int     IPC_CLIENT_CONNECTION_MAXIDLETIME_DEFAULT = 10000; // 10s
  /** See <a href="{@docRoot}/../core-default.html">core-default.xml</a> */
  public static final String  IPC_CLIENT_CONNECT_TIMEOUT_KEY =
    "ipc.client.connect.timeout";
  /** Default value for IPC_CLIENT_CONNECT_TIMEOUT_KEY */
  public static final int     IPC_CLIENT_CONNECT_TIMEOUT_DEFAULT = 20000; // 20s
{code}

                
> NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1061
>                 URL: https://issues.apache.org/jira/browse/YARN-1061
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.5-alpha
>            Reporter: Rohith Sharma K S
>
> It is observed that in one of the scenario, NodeManger is indefinetly waiting for nodeHeartbeat
response from ResouceManger where ResouceManger is in hanged up state.
> NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message