hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13449) S3Guard: Implement DynamoDBMetadataStore.
Date Fri, 18 Nov 2016 02:44:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mingliang Liu updated HADOOP-13449:
    Attachment: HADOOP-13449-HADOOP-13345.005.patch

Thanks for the discussion, [~fabbri]. That's very helpful.

for v1, you could always return authoritative = false. 
Yes, it's the current patch. Let's address this as a follow-up JIRA after the [HADOOP-13651]
and this both be committed.

The interface allows any of these behaviors.... The filesystem is responsible for ensuring
that the delete to /a must be recursive since it is not empty. MetadataStore explicitly does
not do that.
Agreed. For example, {{delete(path)}} does not check the directory path being empty.

You either have to (A) pay money to store an extra copy of your metadata forever, or (B) spend
money and time hydrating the MetadataStore each time you start a cluster.
The metadata size is considered small and the price of DDB storage is low comparing with read/write
operations pricing. If I have to choose, (A) makes more sense.

and we don't assume everything is always in DynamoDB, it makes recovery much easier
That's very valid. Altering S3 and MetadataStore is not atomic.

The other concern is that I just don't understand why you would want to do the preloading.
You mean import? I suppose not. For read/write existing s3 buckets, importing the structure
first seems a prerequisite unless we assume it discovers/converges fast or we reach little
I guess you mean the constrictions on the pre-creating parent directories. I re-read the design
doc and [HADOOP-13651] patch, and think you made a good point about this. Let S3AFileSystem
ensure the contract.

Moreover, I now think storing the is_empty bit in DynamoDB is not ideal. Maintaining it needs
non-trivial effort and it's easy to make it wrong. Perhaps we can query via parent directories
as HASH key when we need this information. This is non-trivial either; I'll think about this
as my next work. We can either fix this in next patch, or I'll work on a follow-up JIRA.

If this patch is still in question, a conference call will be very helpful. Let's schedule
next week. [~stevel@apache.org] is traveling this week.

[~eddyxu] you have more comments since I revised the latest patch?

Thank you,

> S3Guard: Implement DynamoDBMetadataStore.
> -----------------------------------------
>                 Key: HADOOP-13449
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13449
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13449-HADOOP-13345.000.patch, HADOOP-13449-HADOOP-13345.001.patch,
HADOOP-13449-HADOOP-13345.002.patch, HADOOP-13449-HADOOP-13345.003.patch, HADOOP-13449-HADOOP-13345.004.patch,
> Provide an implementation of the metadata store backed by DynamoDB.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message