hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Fraison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11590) Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or not in the cache
Date Tue, 04 Apr 2017 12:58:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955102#comment-15955102
] 

Nicolas Fraison commented on HDFS-11590:
----------------------------------------

Thanks [~daryn] for having provided the patch for YARN-3760.
I've also taken a look to the linked patch HADOOP-12054 which should avoid retrying opening
connection on InvalidToken exception.
During the test of the HADOOP-12054 patch (backported on our base code) the LeaseRenewer thread
was still retrying to renew the lease every second during an hour. In fact the InvalidToken
exception does not happens during the getConnection (in the Client) which rely on the RetryPolicies
but is faced later when sending rpc request which throw the RemoteException being treated
by the renewLease in DFSClient.
The patch I provided will manage this issue. Please let me know your thoughts.


> Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or not in the
cache
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11590
>                 URL: https://issues.apache.org/jira/browse/HDFS-11590
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.6.0
>         Environment: Releases:
> cloudera release cdh-5.5.0
> openjdk version "1.8.0_91"
> linux centos6 servers
> Cluster info:
> Namenode and resourcemanager in HA with kerberos authentication
> More than 1300 datanodes/nodemanagers
>            Reporter: Nicolas Fraison
>            Priority: Minor
>         Attachments: HDFS-11590.patch
>
>
> We have faced some huge slowdowns on our namenode due to all our nodemanagers continuing
to retry to renew a lease and reconnecting to the namenode every second during 1 hour due
to some HDFS_DELEGATION_TOKEN being expired or not in the cache.
> The number of time_wait connection on our namenode was stuck to the maximum configured
of 250k during this period due to the reconnections each time.
> {code}
> 2017-03-02 11:51:42,817 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for appattempt_1488396860014_156103_000001 (auth:TOKEN) for protocol=interface
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
>   2017-03-02 11:51:43,414 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for appattempt_1488396860014_156120_000001 (auth:TOKEN) for protocol=interface
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
>   2017-03-02 11:51:51,994 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
>   2017-03-02 11:51:51,995 WARN org.apache.hadoop.ipc.Client: Exception encountered while
connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
>   2017-03-02 11:51:51,995 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
>   2017-03-02 11:51:51,995 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease
for [DFSClient_NONMAPREDUCE_1560141256_4187204] for 30 seconds.  Will retry shortly ...
>   token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
>      at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>      at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
>      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
>      at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:498)
>      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
>      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>      at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
>      at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
>      at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
>      at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
>      at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
>      at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
>      at java.lang.Thread.run(Thread.java:745)
>   2017-03-02 12:51:22,032 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
>   2017-03-02 12:51:22,032 WARN org.apache.hadoop.ipc.Client: Exception encountered while
connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
>   2017-03-02 12:51:22,033 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
>   2017-03-02 12:51:22,033 WARN org.apache.hadoop.hdfs.DFSClient: Failed to renew lease
for DFSClient_NONMAPREDUCE_1560141256_4187204 for 3600 seconds (>= hard-limit =3600 seconds.)
Closing all files being written ...
>   token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
>      at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>      at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
>      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
>      at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:498)
>      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
>      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>      at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
>      at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
>      at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
>      at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
>      at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
>      at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
>      at java.lang.Thread.run(Thread.java:745)
>   2017-03-02 12:51:27,364 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The
logs will be aggregated after this application is finished.
> {code}
> The root cause is the yarn proxy configuration having been removed, which in turn causes
the resource manager to be unable to renew the HDFS_DELEGATION_TOKEN.
> Even though the root cause has been identified, I don't think retrying to renew a lease
every second for an hour when there is an expiry/not found token issue is normal because this
is not an issue that can be recovered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message