hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Ivanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13653) ZKDelegationTokenSecretManager curator client seems to rapidly connect & disconnect from ZK
Date Wed, 28 Sep 2016 05:24:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528459#comment-15528459
] 

Alex Ivanov commented on HADOOP-13653:
--------------------------------------

[~xiaochen], thank you for your comments. The zookeeper (ZK) cluster seemed healthy, and nothing
in any of the zookeeper logs indicated loss of quorum or random disconnects.
Instead, it seems ZK connections became unstable after an accumulation of a significant number
of delegation tokens for KMS (>160,000). I'm not sure how this caused the issue, but once
we manually deleted the tokens, the disconnects stopped. Once we apply the patch you provided
for [HADOOP-13487|https://issues.apache.org/jira/browse/HADOOP-13487] (thank you!), I expect
we'll be able to better manage the number of dtokens in ZK.

I do wish we were able to control some of the parameters for curator, so that we can adjust
the timeouts for our needs, and curtail the repetitive error logging when a disconnect happens
- these logs have taken up to 70GB of space per day, which turns a single log viewing into
a big data problem.

On a different note, in a situation with multiple KMS instances, you pointed out how the {{LoadBalancingKMSClientProvider}}
will try to find a working KMS. The problem I've seen is the KMS client timeout seems quite
long, so in the case of one failed KMS, it takes a long time to talk to KMS from a client
perspective. Do you know how we can configure this behavior and have a shorter timeout?

> ZKDelegationTokenSecretManager curator client seems to rapidly connect & disconnect
from ZK
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13653
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13653
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: kms
>            Reporter: Alex Ivanov
>            Priority: Critical
>
> During periods of time, KMS gets in a connect/disconnect loop from Zookeeper. It is not
clear what causes the connection to be closed. I didn't see any issues on the ZK server side,
so the issue must reside on client side.
> *Example errors*
> NOTE: I had to filter the logs heavily since they were many GB in size (thanks to curator
error logging). What is left is an illustration of the delegation token creations, and the
Zookeeper sessions getting lost and re-established over the course of 2 hours.
> {code}
> 2016-09-25 01:43:04,377 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [75027a21ab399aa7789d6907d70fadc4, 46]
> 2016-09-25 01:43:04,557 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [1106d0754d43dcf29324d7be737f51f0, 46]
> 2016-09-25 01:43:11,846 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [4426092c861f49c6ba0c60b49b9539e5, 46]
> 2016-09-25 01:43:48,974 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [a99efff2705d6489deb059098f18818f, 46]
> 2016-09-25 01:43:49,174 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [398b5962fd647880961ba5e86a77b414, 46]
> 2016-09-25 01:44:03,359 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [413187e62a21b5459422b5c524315d06, 46]
> 2016-09-25 01:44:03,625 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [7cc2c0d82edd40e7e6f6f40af20d04d3, 46]
> 2016-09-25 01:44:06,062 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [bd9394fce20607c12bc00104bea49284, 46]
> 2016-09-25 01:44:07,134 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [7dad3bd10526517e5e1cfccd2e96074a, 46]
> 2016-09-25 01:44:07,230 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [a712ed40687580647d070c9c7f525e15, 46]
> 2016-09-25 01:44:48,481 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [44bfefa31192c68e3cc053eec4e57e14, 46]
> 2016-09-25 01:44:48,522 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [67efc2aa65eeba701ad7d3d7bab51def, 46]
> 2016-09-25 01:44:50,259 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [b43e641f58dfbd2c72550ab6804f37d1, 46]
> 2016-09-25 01:44:54,271 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [ac2fbcf404c633759b75e6d6aae00e05, 46]
> 2016-09-25 01:44:56,141 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [cdbd224079a4a10400d00d0b8eece008, 46]
> 2016-09-25 01:45:01,328 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [e03218f4835524f3d05519d27bb04e35, 46]
> 2016-09-25 01:45:02,728 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [569ae6d666d584b6843fffc47a63d147, 46]
> 2016-09-25 01:45:02,832 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [c9048271483da234c12f75569b9513c6, 46]
> 2016-09-25 01:45:05,536 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [f519d621389e41b63e8d92b4cb15f832, 46]
> 2016-09-25 01:45:07,886 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [45cf6ba58b2bb348ac5e88fa18fe9dad, 46]
> 2016-09-25 01:47:24,346 WARN  ConnectionState - Connection attempt unsuccessful after
66294 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> 2016-09-25 01:47:25,120 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [f160a865db69ef33548f146c9b3b84c6, 46]
> 2016-09-25 01:47:25,276 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [9d60add471464e01ef691c43bd901d96, 46]
> 2016-09-25 01:47:28,739 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [659ecabf02ff809736202a8484ff2be8, 46]
> 2016-09-25 01:48:33,233 WARN  ConnectionState - Connection attempt unsuccessful after
64494 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> 2016-09-25 01:48:33,306 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [15b87dd6c2251177d8db9dda415d0e06, 46]
> 2016-09-25 01:48:33,459 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [306aa796017aab2b559bf503f81175e0, 46]
> 2016-09-25 01:48:34,665 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [6c47cc10edcf0931e7e26665f99dadc5, 46]
> 2016-09-25 01:49:40,669 WARN  ConnectionState - Connection attempt unsuccessful after
66006 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> 2016-09-25 01:49:40,847 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [049585962d9ac1cdb4df17f826891130, 46]
> 2016-09-25 01:49:41,523 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [a0fd8cffe2ba40d63d4cd009aabb77bb, 46]
> 2016-09-25 01:49:44,811 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [ea94d6cc4df6621a38be9f79121b5cc6, 46]
> 2016-09-25 01:50:48,781 WARN  ConnectionState - Connection attempt unsuccessful after
64005 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> ...
> ...
> 2016-09-25 03:39:51,312 WARN  ConnectionState - Connection attempt unsuccessful after
60001 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> 2016-09-25 03:39:51,752 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [2b605504535c741de3d18ed30fe90d7c, 46]
> 2016-09-25 03:39:55,345 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [9dda97d42df3bf8ae805f1da2857bc33, 46]
> 2016-09-25 03:39:55,668 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [f52dcf38a4565a904be93fe3de92825d, 46]
> 2016-09-25 03:41:02,660 WARN  ConnectionState - Connection attempt unsuccessful after
67005 (greater than max timeout of 60000). Resetting connection and trying again with a new
connection.
> 2016-09-25 03:41:02,921 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [770057f6b1412f096a92ce6599c6422a, 46]
> 2016-09-25 03:41:03,035 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [f2b366612a15e0a450eac8b4cf556515, 46]
> 2016-09-25 03:41:05,903 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [a3b2d0684e1373fc2f9e5c8bff5e9939, 46]
> 2016-09-25 03:41:05,994 INFO  AbstractDelegationTokenSecretManager - Creating password
for identifier: [f9e72fd48c8aab0a737d2cc8319aa389, 46]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message