hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13761) S3Guard: implement retries
Date Fri, 01 Sep 2017 13:40:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150539#comment-16150539
] 

Steve Loughran commented on HADOOP-13761:
-----------------------------------------

Managed to break tests when working with a bucket whose DDB table was precreated at {{ hadoop
s3guard init -write 20 -read 20}} & five parallel test cases

{code}

Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 230.809 sec <<< FAILURE!
- in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB
testPruneCommandCLI(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time elapsed:
174.084 sec  <<< ERROR!
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level
of configured provisioned throughput for the table was exceeded. Consider increasing your
provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400;
Error Code: ProvisionedThroughputExceededException; Request ID: 9NTSEF5S8M3EI7MUN0EV2ERKE3VV4KQNSO5AEMVJF66Q9ASUAAJG)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2089)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2065)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeBatchWriteItem(AmazonDynamoDBClient.java:575)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:551)
	at com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
	at com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItemUnprocessed(BatchWriteItemImpl.java:64)
	at com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItemUnprocessed(DynamoDB.java:189)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:580)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:761)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Prune.run(S3GuardTool.java:938)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.exec(AbstractS3GuardToolTestBase.java:277)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.exec(AbstractS3GuardToolTestBase.java:255)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:194)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandCLI(AbstractS3GuardToolTestBase.java:206)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Tests run: 62, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 388.583 sec - in org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextMainOperations

Results :

{code}

> S3Guard: implement retries 
> ---------------------------
>
>                 Key: HADOOP-13761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Aaron Fabbri
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are needed,
including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but we fail
to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the file exists,
retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate
JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing to make sure
we're covered.  Failure injection tests can be a separate JIRA to make this easier to review.
> We also need basic configuration parameters around retry policy.  There should be a way
to specify maximum retry duration, as some applications would prefer to receive an error eventually,
than waiting indefinitely.  We should also be keeping statistics when inconsistency is detected
and we enter a retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message