hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed
Date Wed, 17 May 2017 18:46:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014601#comment-16014601
] 

Steve Loughran commented on HADOOP-14303:
-----------------------------------------

+ S3aInputStream retries on some non-recoverable events, as it does one extra attempt on any
exception. This can lead to 404s triggering a retry rather than fail fast. 
{code}
testSequentialRead(org.apache.hadoop.fs.contract.s3a.ITestS3AContractOpen)  Time elapsed:
1.221 sec  <<< ERROR!
java.io.FileNotFoundException: Reopen at position 0 on s3a://hwdev-steve-ireland-new/fork-0007/test/testsequentialread.txt:
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service:
Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 8D81F218D02DE21E), S3 Extended
Request ID: aXUWP6yYGSsP9ofVawyIteGZWBmkNTFjmRCvwAR1KyJmtR0A6H6UOggE4OlYB2ZOJ99F3MV74fU=
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:166)
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:165)
	at org.apache.hadoop.fs.s3a.S3AInputStream.onReadFailure(S3AInputStream.java:348)
	at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:321)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testSequentialRead(AbstractContractOpenTest.java:156)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist.
(Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 8D81F218D02DE21E)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1586)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1254)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1373)
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:158)
	at org.apache.hadoop.fs.s3a.S3AInputStream.onReadFailure(S3AInputStream.java:348)
	at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:321)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testSequentialRead(AbstractContractOpenTest.java:156)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

Given this is a failure path, its not too expensive, but there are probably a few like this
(auth fail too).

> Review retry logic on all S3 SDK calls, implement where needed
> --------------------------------------------------------------
>
>                 Key: HADOOP-14303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle this without
failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make sure we are
retrying with some backoff & jitter policy, ideally something unified. This must be more
systematic than the case-by-case, problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), but we need
to check the other parts of the process: login, initiate/complete MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate recoverable
throttle & network failures from unrecoverable problems like: auth, network config (e.g
bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which can be configured
in IT Tests to fail at specific points. Ryan Blue's mcok S3 client does this in HADOOP-13786,
but it is for 100% mock. I'm thinking of something with similar fault raising, but in front
of the real S3A client 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message