hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chackaravarthy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-5445) Log aggregation configured to different namenode can fail fast
Date Thu, 04 Aug 2016 09:32:20 GMT

     [ https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chackaravarthy updated YARN-5445:
    Attachment: YARN-5445-1.patch

Attached a patch where LogAggregationService creates a new yarn config and overrides {{dfs.client.failover.max.attempts}}
to create FileSystem instance.

There are two open points to discuss :

* "dfs.client.failover.max.attempts" is hardcoded in LogAggregationService because this config
param is present in hadoop-hdfs (DFSConfigKeys.java) project. And hadoop-yarn-server-nodemanager
(LogAggregationService) does not have dependency on hadoop-hdfs. "DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY"
to be moved from DFSConfigKeys.java (hadoop-hdfs) to CommonConfigurationKeys.java (hadoop-common).
Any other way? 
* Only one new config is introduced {{yarn.nodemanager.remote-app-log-dfs-client-failover-max-attempts}}
considering that Namenode is setup in HA mode. Need to check for other configs like "dfs.client.retry.max.attempts"
for non-ha setup. 

Please check whether this is the correct way to handle? If not, please give suggestion. Thanks.

> Log aggregation configured to different namenode can fail fast
> --------------------------------------------------------------
>                 Key: YARN-5445
>                 URL: https://issues.apache.org/jira/browse/YARN-5445
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chackaravarthy
>         Attachments: YARN-5445-1.patch
> Log aggregation is enabled and configured to write applogs to different cluster or different
namespace (NN federation). In these cases, would like to have some configs on attempts or
retries to fail fast in case the other cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and hence adding
a latency of 2 to 2.5 mins in each container launch (per node manager).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message