hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed
Date Thu, 13 Apr 2017 10:55:42 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967405#comment-15967405
] 

Steve Loughran commented on HADOOP-14303:
-----------------------------------------

If we could set the socket factory in the test JVMs, we could even simulate network failures
under the AWS SDK. This would be the best fault injection of all short of having an SDN and
custom DNS server to as part of the test infra

> Review retry logic on all S3 SDK calls, implement where needed
> --------------------------------------------------------------
>
>                 Key: HADOOP-14303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle this without
failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make sure we are
retrying with some backoff & jitter policy, ideally something unified. This must be more
systematic than the case-by-case, problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), but we need
to check the other parts of the process: login, initiate/complete MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate recoverable
throttle & network failures from unrecoverable problems like: auth, network config (e.g
bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which can be configured
in IT Tests to fail at specific points. Ryan Blue's mcok S3 client does this in HADOOP-13786,
but it is for 100% mock. I'm thinking of something with similar fault raising, but in front
of the real S3A client 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message