hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration
Date Mon, 12 Dec 2016 23:56:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743567#comment-15743567

Aaron Fabbri commented on HADOOP-13336:

Great summary [~steve_l]. I think being backward-compatible with existing configs and URIs
in production is important.  These all seem reasonable, but URI compatibility seems to point
to option A for me (if we want to keep it simple).  The annoying thing is that these are hard
to change if we decide we want a different option. Which option are you leaning towards? 

*Option A* per-bucket config.
Lets you define everything for a bucket.
s3a://olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set fs.s3a.bucket.olap2
in configuration
s3a://landsat : s3a URL s3a://landsat, with config set fs.s3a.landsat for anonymous credentials
and no dynamo

To avoid key space conflicts I'd suggest a prefix of fs.s3a.bucket.<bucket-name> instead
of fs.s3a.<bucket-name>. Just in case someone has an s3 bucket named "endpoint", they'd
use {{fs.s3a.bucket.endpoint.*}} instead of conflicting with {{fs.s3a.endpoint}}, etc..

This option seems pretty straightforward.  Should be backward compatible as it requires no
changes to URIs and existing default or "all bucket" config keys continue to work the same.
 For grabbing config values in S3A, we'd call some per-bucket Configuration wrapper that looks
for the fs.s3a.bucket.<bucket-name>.* key first, and if not, returns whatever is in
the non-bucket-specific config.

*Option B* config via domain name in URL
This is what swift does: you define a domain, with the domain defining everything.
s3a://olap2.dynamo/data/2017 with config sett fs.s3a.binding.dynamo
s3a://landsat.anon with config set fs.s3a.binding.anon for anonymous credentials and no dynamo

As you mention, my desire for URI backward-compatibility implies we need an additional way
to map a bucket to a domain, e.g. {{fs.s3a.domain.bucket.my-bucket=my-domain}}.  Seems a bit
too complex. This buys us the ability to share a config over some set of buckets. 

Also, does this break folks who use FQDN bucket names?

*Option C* Config via user:pass property in URL
This is a bit like Azure, where the FQDN defines the binding, and the username defines the
bucket. Here I'm proposing the ability to define a new user which declares the binding info.
s3a://dynamo@olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set fs.s3a.binding.dynamo
s3a://anon@landsat : s3a URL s3a://landsat, with config set fs.s3a.binding.anon for anonymous

Seems reasonable but the need to change URIs is unfortunate.

> S3A to support per-bucket configuration
> ---------------------------------------
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
> S3a now supports different regions, by way of declaring the endpoint —but you can't
do things like read in one region, write back in another (e.g. a distcp backup), because only
one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt s3a://b2.seol
, then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, etc, in the
XML file. Would we need to do that much? It'd be simpler initially to use a domain suffix
of a URL to set the region of a bucket from the domain and have the aws library sort the details
out itself, maybe with some config options for working with non-AWS infra

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message