hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
Date Fri, 03 Feb 2017 23:57:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852359#comment-15852359

Aaron Fabbri commented on HADOOP-13936:

I chatted with [~mackrorysd] a little bit about an additional modification time column that
records the age of each dynamodb entry.  It was in the context of the Prune CLI command he
is working on. For pruning old metadata (files that are presumably consistent in S3 now),
the existing modification time seems to be sufficient, as it reflects the likelihood of each
file still being inconsistent in S3.

For expiring cache entries, however, a separate "ddb entry modification time" field seems
useful..  In the context of this JIRA, where we want the MetadataStore to "self heal" by timing
out stale entries, we actually want a timestamp for when the entry was last modified in DDB.
 The example that comes to mind is when fs.s3a.metadatastore.authoritative is true (we allow
caching directory listings): we may load a directory listing with some old files (created
a year ago).. We want to time out those entries based on when they were last written to DDB,
not based on when the underlying S3 objects were created.

> S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
> -------------------------------------------------------------------------
>                 Key: HADOOP-13936
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13936
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Priority: Minor
> As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, which deletes
keys from S3 in batches (default is 1000). But DynamoDB is updated only at the end of this
operation. This can cause issues when deleting large number of keys. 
> E.g, it is possible to get exception after deleting 1000 keys and in such cases dynamoDB
would not be updated. This can cause DynamoDB to go out of sync. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message