hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-14521) KMS client needs retry logic
Date Thu, 21 Sep 2017 23:41:01 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HADOOP-14521:
-------------------------------
    Attachment: HADOOP-14521.11.patch

Patch 11 to implement the idea in the comments above.

Existing tests already verifies the behavior, so v11 only needed to change back the test modification
v10 did on existing tests, and update the expected times invoked for the new tests.

[~shahrs87], would you mind take a look? IMO this still benefits from your retry improvement,
yet kept existing behavior.

> KMS client needs retry logic
> ----------------------------
>
>                 Key: HADOOP-14521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14521
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HADOOP-14521.09.patch, HADOOP-14521.11.patch, HADOOP-14521-branch-2.8.002.patch,
HADOOP-14521-branch-2.8.2.patch, HADOOP-14521-trunk-10.patch, HDFS-11804-branch-2.8.patch,
HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch,
HDFS-11804-trunk-5.patch, HDFS-11804-trunk-6.patch, HDFS-11804-trunk-7.patch, HDFS-11804-trunk-8.patch,
HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely decoupled
from the ipc retry logic.  This has major impacts if the KMS is unreachable for any reason,
including but not limited to network connection issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives EDEK in
the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message