hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
Date Fri, 03 Feb 2017 20:14:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852047#comment-15852047

Aaron Fabbri commented on HADOOP-13904:

Thank you for the review.

Yetus is unhappy with it...is it in sync with the branch?

See the end of my previous comment.  This is based on top of HADOOP-13876.  That one is higher
priority to get in than this one (this one is an efficiency issue AFAICT).

the retry policy should really detect and reject the auth failures as non-retryable

I think that already happens here. My retry implementation doesn't catch any exceptions. 
I'd expect the DDB batch write API to throw exception if we hit an auth failure, which naturally
bypasses the retry logic.

I've been digging through docs (including [SDK|http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#batchWriteItem-com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest]
and [REST|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#BatchOperations]
doc for batch write) and I think this patch is correct, except, I just learned that batch
write *may* throw {{ProvisionedThroughputExceededException}} if *none* of the items in the
batch could be executed.  I could not reproduce this despite abusive testing on a 10 I/O unit
provisioned table.

(a) handle interruptions by interrupting thread again, 

I wondered about this.  Currently I do not catch InterruptedException; I let it propagate.
 Are you saying I should catch it, set the interrupt flag, break out of the retry loop, and
continue execution?

(b) handling any other exception by just returning false to the shouldRetry probe. 

This Batch API is a bit of a special case.. Instead of just throwing exceptions for failures,
it appears to propagate non-retryable exceptions, but translates retryable ones into BatchWriteItem
"unprocessed items".  So this patch just slows down resubmittal of the retryable items.

This patch essentially keeps existing exception behavior but just slows down batch work resubmittal.
 So I think it is an improvement, but we may have to add a higher-level retry loop for the
{{ProvisionedThroughputExceededException}} case.  Why they don't just return all items as
unprocessed is beyond me.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> ----------------------------------------------------------------------------
>                 Key: HADOOP-13904
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13904
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13904-HADOOP-13345.001.patch, HADOOP-13904-HADOOP-13345.002.patch
> When you overload DDB, you get error messages warning of throttling, [as documented by
> Reduce load on DDB by doing a table lookup before the create, then, in table create/delete
operations and in get/put actions, recognise the error codes and retry using an appropriate
retry policy (exponential backoff + ultimate failure) 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message