hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure
Date Wed, 22 Nov 2017 17:35:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran resolved HADOOP-14381.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 3.1.0

Fixed in HADOOP-13786; inconsistent s3 client generated throttle events and so can be used
to test this. There's a also a metric/statistic on the # fielded at the S3A level.

AWS SDK handles a lot of throttling internally, these values aren't picked up

> S3AUtils.translateException to map 503 reponse to => throttling failure
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-14381
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14381
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>             Fix For: 3.1.0
>
>
> When AWS S3 returns "503", it means that the overall set of requests on a part of an
S3 bucket exceeds the permitted limit; the client(s) need to throttle back or away for some
rebalancing to complete.
> The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't do anything
with that other than create a generic {{AWSS3IOException}}.
> Proposed
> * add a new exception, {{AWSOverloadedException}}
> * raise it on a 503 from S3 (& for s3guard, on DDB complaints)
> * have it include a link to a wiki page on the topic, as well as the path
> * and any other diags
> Code talking to S3 may then be able to catch this and choose to react. Some retry with
exponential backoff is the obvious option. Failing, well, that could trigger task reattempts
at that part of the query, then job retry —which will again fail, *unless the number of
tasks run in parallel is reduced*
> As this throttling is across all clients talking to the same part of a bucket, fixing
it is potentially a high level option. We can at least start by reporting things better



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message