hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13876) S3Guard: better support for multi-bucket access including read-only
Date Fri, 20 Jan 2017 01:01:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830928#comment-15830928

Aaron Fabbri commented on HADOOP-13876:

Thanks [~steve_l].

I agree that most of this is addressed by per-bucket config.  On the "one DynamoDB table per
cluster" part, however, there are still assumptions in the DynamoDB (DDB) code that a DynamoDBMetadataStore
is 1:1 with a S3AFileSystem:

- Paths stored in DDB do not include the bucket name.
- DDB code uses {{S3AFileSystem#getUri()}} value for call to {{Path#makeQualified()}}.  See
callers of {{itemToPathMetadata()}}. (This part actually breaks when the new {{DynamoDBMetadataStore#initialize(Configuration)}}
method added for the CLI work is used).

I want to fix this part, as the single DDB table per cluster is the main use case my users
want.  I already went through this exercise in LocalMetadataStore (which stores bucket name
with path), so it should be straightforward.

I could see us merging to trunk without this fixed, if we could enforce that users can't access
the same fs.s3a.s3guard.ddb.table with multiple buckets.  If they did that, it appears they'd
risk collisions (e.g. s3a://bucket-a/path1 == s3a://bucket-b/path1)

> S3Guard: better support for multi-bucket access including read-only
> -------------------------------------------------------------------
>                 Key: HADOOP-13876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13876
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13876-HADOOP-13345.000.patch
> HADOOP-13449 adds support for DynamoDBMetadataStore.
> The code currently supports two options for choosing DynamoDB table names:
> 1. Use name of each s3 bucket and auto-create a DynamoDB table for each.
> 2. Configure a table name in the {{fs.s3a.s3guard.ddb.table}} parameter.
> One of the issues is with accessing read-only buckets.  If a user accesses a read-only
bucket with credentials that do not have DynamoDB write permissions, they will get errors
when trying to access the read-only bucket.  This manifests causes test failures for {{ITestS3AAWSCredentialsProvider}}.
> Goals for this JIRA:
> - Fix {{ITestS3AAWSCredentialsProvider}} in a way that makes sense for the real use-case.
> - Allow for a "one DynamoDB table per cluster" configuration with a way to chose which
credentials are used for DynamoDB.
> - Document limitations etc. in the s3guard.md site doc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message