hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions
Date Wed, 07 Feb 2018 20:47:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356048#comment-16356048

Steve Loughran commented on HADOOP-13761:

I've not seen that exception since; it was on ASF trunk/, so unless we've updated the SDK
since then (we have, haven't we?), I don't know what's up

h3. DynamoDBMetadataStore

L693: leave @ debug

{{DynamoDBMetadataStore.updateParameters()}} should which to {{provisionTableBlocking()}},
so the CLI tool will not complete until the provisioning has happened. This will improve its
ability to be used in scripts & tests.

h3. ITestDynamoDBMetadataStoreScale

* I like the fact the DB gets shrunk back after. Currently the CLI tests slowly leak capacity,
even though it should, on my reading, clean up
* {{pathOfDepth}} has the base path "/scaleTestBWEP" . I think it should be getClass.getShortName()

This is going to be fun on a shared test run

# need to make sure that is not run in parallel
# need to doc that this is test must be run on a private DDB instance, not one shared across
other buckets. Don't want other teams getting upset because their tests on a different bucket
are failing.

For HADOOP-14918 I've been wondering if we should have tests explicitly declare a "test DDB
table"; {{ITestS3GuardToolDynamoDB}} hard codes to "testDynamoDBInitDestroy", which relies
on at most one user in a shared AWS a/c from running the test at the same time. This is one
which can be created on demand, destroyed afterwards, so the test suite can do what it wants.
This would line up for future tests of things like upgrades, TTL checking, etc. This scale
test could share the same config option.

> S3Guard: implement retries for DDB failures and throttling; translate exceptions
> --------------------------------------------------------------------------------
>                 Key: HADOOP-13761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>            Priority: Blocker
>         Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are needed,
> - open(path).  If MetadataStore reflects recent create/move of file path, but we fail
to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the file exists,
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate
JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing to make sure
we're covered.  Failure injection tests can be a separate JIRA to make this easier to review.
> We also need basic configuration parameters around retry policy.  There should be a way
to specify maximum retry duration, as some applications would prefer to receive an error eventually,
than waiting indefinitely.  We should also be keeping statistics when inconsistency is detected
and we enter a retry loop.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message