hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
Date Thu, 19 Jan 2017 02:15:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Fabbri updated HADOOP-13904:
----------------------------------
    Attachment: HADOOP-13904-HADOOP-13345.001.patch

Attaching v1 patch.  It adds new scale tests for DynamoDBMetadataStore and LocalMetadataStore.
 I think we should get HADOOP-13589 in first, and I am happy to rebase this when that is committed.

The included change to the docs (s3guard.md) describes a configuration I used to reliably
trigger DynamoDB throttling.  I was able to observe both a significant slowdown in the test
execution, as well WriteThrottle events in my AWS CloudWatch UI.

I also added some instrumentation around our use of DynamoDB's batched write API, as the docs
imply that we need to add our own backoff timers there. The output looks like this:

{quote}
2017-01-18 15:21:15,930 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 10 retries to complete
2017-01-18 15:21:18,447 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 7 retries to complete
2017-01-18 15:21:20,987 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 6 retries to complete
2017-01-18 15:21:23,530 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 6 retries to complete
2017-01-18 15:21:25,975 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 9 retries to complete
2017-01-18 15:21:28,561 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 6 retries to complete
2017-01-18 15:21:31,037 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 8 retries to complete
2017-01-18 15:21:33,407 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 5 retries to complete
2017-01-18 15:21:35,685 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444))
- Batched write took 6 retries to complete
{quote}

Next I will dig into the AWS SDK source and/or put timing around the retry calls to `batchWriteItemUnprocessed()`
to see if (A) the SDK is doing exponential backoff for us, or (B) we need to add a sleep timer
in that retry loop.



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13904
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13904
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as documented by
AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table create/delete
operations and in get/put actions, recognise the error codes and retry using an appropriate
retry policy (exponential backoff + ultimate failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message