hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11804) KMS client needs retry logic
Date Sun, 21 May 2017 18:10:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018923#comment-16018923
] 

Rushabh S Shah edited comment on HDFS-11804 at 5/21/17 6:09 PM:
----------------------------------------------------------------

This is by no means a committable patch.
It needs some rework.
Changes done in this patch:
# Only LoadBalancingKMSClientProvider will be instantiated when createProvider is being called.
# Added three new config keys similar to dfs client.
#* {{kms.client.failover.attempts.multiplier}}: This is the multiplier factor.
For example: if {{kms.client.failover.attempts.multiplier}} is 2 and there are 2 backend kms
servers, then client will retry for maximum 4 times.
Default is 1 to maintain original behavior.
#* {{kms.client.failover.sleep.base.millis}}: To calculate failover sleep time.
Default is 500 (same as dfs client configuration)
#* {{kms.client.failover.sleep.max.millis}}: To calculate maximum amount of time to sleep.
Default is 15000 (same as dfs client configuration)
# The client is configured to use {{FailoverOnNetworkExceptionRetry}} with fallback policy
as {{TryOnceThenFail}}.
On encountering AccessControlException or AuthorizationException, the client will stop retrying.
This assumes all the servers are configured with identical permissions and key acls.
# Since the {{RetryPolicy#shouldRetry}} function expects whether the option is idempotent
or not, We need to decide what all operations are idempotent.
Below is my understanding.
          #* IdempotentOperations: renewDelegationToken, cancelDelegationToken, decryptEncryptedKey,
getKeyVersion, getKeys, getKeysMetadata, getKeyVersions, getCurrentKey, getMetadata
          #* AtMostOnce: createKey, deleteKey, rollNewVersion
          #* Not sure: addDelegationTokens, generateEncryptedKey, reencryptEncryptedKey
Would like to know your views.

There are few TODOs in the patch. Would like to know your opinions on that also.
Thanks in advance for reviewing my patch.


was (Author: shahrs87):
This is by no means a committable patch.
It needs some rework.
Changes done in this patch:
1. Only LoadBalancingKMSClientProvider will be instantiated when createProvider is being called.
2. Added three new config keys similar to dfs client.
* {{kms.client.failover.attempts.multiplier}}: This is the multiplier factor.
For example: if {{kms.client.failover.attempts.multiplier}} is 2 and there are 2 backend kms
servers, then client will retry for maximum 4 times.
* {{kms.client.failover.sleep.base.millis}}: To calculate failover sleep time.
* {{kms.client.failover.sleep.max.millis}}: To calculate maximum amount of time to sleep.
3. The client is configured to use {{FailoverOnNetworkExceptionRetry}} with fallback policy
as {{TryOnceThenFail}}.
On encountering AccessControlException or AuthorizationException, the client will stop retrying.
This assumes all the servers are configured with identical permissions and key acls.
4. Since the {{RetryPolicy#shouldRetry}} function expects whether the option is idempotent
or not, We need to decide what all operations are idempotent.
Below is my understanding.
          ** IdempotentOperations: renewDelegationToken, cancelDelegationToken, decryptEncryptedKey,
getKeyVersion, getKeys, getKeysMetadata, getKeyVersions, getCurrentKey, getMetadata
          ** AtMostOnce: createKey, deleteKey, rollNewVersion
          ** Not sure: addDelegationTokens, generateEncryptedKey, reencryptEncryptedKey
Would like to know your views.
There are few TODOs in the patch. Would like to know your opinions on that also.
Thanks in advance for reviewing my patch.

> KMS client needs retry logic
> ----------------------------
>
>                 Key: HDFS-11804
>                 URL: https://issues.apache.org/jira/browse/HDFS-11804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely decoupled
from the ipc retry logic.  This has major impacts if the KMS is unreachable for any reason,
including but not limited to network connection issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives EDEK in
the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message