hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed
Date Thu, 22 Mar 2018 04:45:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Vinod Kumar Vavilapalli resolved HADOOP-14303.
       Resolution: Duplicate
    Fix Version/s:     (was: 3.1.0)

> Review retry logic on all S3 SDK calls, implement where needed
> --------------------------------------------------------------
>                 Key: HADOOP-14303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle this without
failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make sure we are
retrying with some backoff & jitter policy, ideally something unified. This must be more
systematic than the case-by-case, problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), but we need
to check the other parts of the process: login, initiate/complete MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate recoverable
throttle & network failures from unrecoverable problems like: auth, network config (e.g
bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which can be configured
in IT Tests to fail at specific points. Ryan Blue's mcok S3 client does this in HADOOP-13786,
but it is for 100% mock. I'm thinking of something with similar fault raising, but in front
of the real S3A client 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message