hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure
Date Thu, 04 May 2017 18:38:04 GMT
Steve Loughran created HADOOP-14381:

             Summary: S3AUtils.translateException to map 503 reponse to => throttling failure
                 Key: HADOOP-14381
                 URL: https://issues.apache.org/jira/browse/HADOOP-14381
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 2.8.0
            Reporter: Steve Loughran

When AWS S3 returns "503", it means that the overall set of requests on a part of an S3 bucket
exceeds the permitted limit; the client(s) need to throttle back or away for some rebalancing
to complete.

The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't do anything
with that other than create a generic {{AWSS3IOException}}.

* add a new exception, {{AWSOverloadedException}}
* raise it on a 503 from S3 (& for s3guard, on DDB complaints)
* have it include a link to a wiki page on the topic, as well as the path
* and any other diags

Code talking to S3 may then be able to catch this and choose to react. Some retry with exponential
backoff is the obvious option. Failing, well, that could trigger task reattempts at that part
of the query, then job retry —which will again fail, *unless the number of tasks run in
parallel is reduced*

As this throttling is across all clients talking to the same part of a bucket, fixing it is
potentially a high level option. We can at least start by reporting things better

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message