hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14749) review s3guard docs & code prior to merge
Date Fri, 11 Aug 2017 18:23:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123792#comment-16123792

Aaron Fabbri commented on HADOOP-14749:

If we added a field for each entry as to when the record itself was created, then we could
have AWS TTL do the pruning automatically.
I think we will want a "entry last written" mod time field in DDB, but I don't think we can
use S3's TTL feature without breaking the "all ancestors of any path P in DDB must be present"
invariant.  I chatted with my friend that works on the DynamoDB team and he did not believe
that their TTL deletion feature was strongly ordered enough to guarantee it, even if we could
ensure we always wrote ancestors before children.  Maybe there is another algorithm I'm not
thinking of though.

I do think we want a v2 prune implementation for dynamo which works better (i.e. actually
expires directories properly).  I think that the authoritative mode support for dynamodb will
be a big motivator for this, as if you are relying on DDB as source of truth for listings,
then reliable expiry of stale data becomes more important.  I've also been thinking about
the online algorithm variant of prune (doing it on demand in client, probabilistically / randomized
perhaps, or on access).

> review s3guard docs & code prior to merge
> -----------------------------------------
>                 Key: HADOOP-14749
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14749
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-14749-HADOOP-13345-001.patch, HADOOP-14749-HADOOP-13345-002.patch,
HADOOP-14749-HADOOP-13345-003.patch, HADOOP-14749-HADOOP-13345-004.patch, HADOOP-14749-HADOOP-13345-005.patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> Pre-merge cleanup while it's still easy to do
> * Read through all the docs, tune
> * Diff the trunk/branch files to see if we can reduce the delta (and hence the changes)
> * Review the new tests

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message