hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Laurence (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration
Date Wed, 14 Dec 2016 17:52:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748969#comment-15748969
] 

Doug Laurence commented on HADOOP-13336:
----------------------------------------

It's good to support configuration that is simple for the simple cases, and more complex (but
possible) for the more sophisticated use cases. A single bucket in one region is the simplest
case, and multiple buckets in multiple regions is on the more sophisticated end of the spectrum.
Providing attributes in the URI is really simple, but probably breaks down over time as capabilities/features
are introduced over time, so the fs.s3a config set approach seems more future-proof.

For example, a user with two buckets (e.g. landsat1 and landsat2) in the same region (e.g.
us-west-2) could set the endpoint with a single config set property:
{code}fs.s3a.default.endpoint=s3-us-west-2.amazonaws.com{code}
and then simply use the URI prefixes s3a://landsat1/ and s3a://landsat2/

If a third bucket (e.g. landsat-frankfurt) needed a different endpoint, then a per-bucket
override would be required:
{code}fs.s3a.bucket.endpoint.landsat-frankfurt=s3.eu-central-1.amazonaws.com{code}

This allows for specifying default properties for all buckets and properties for individual
buckets such as authentication details, but also enables the addition of new features over
time if desired. 

For example, if we wanted to add support for specifying the default S3 storage class (e.g.
S3-Infrequent Access) when each new file is created, we could add a new storage-class sub-property
to the config sets i.e. fs.s3a.default.storage-class and fs.s3a.bucket.storage-class.  For
example:
{code}fs.s3a.bucket.storage-class.landsat-frankfurt=STANDARD_IA{code}
or we could enable users to specify object tag(s) to be applied to each new object created:
{code}fs.s3a.bucket.object-tagging.landsat-frankfurt=Project=x,Classification=internal{code}

In general, the per-bucket config set sub-properties would also be supported in the fs.s3a.default.*
config set namespace. I suggest making the bucket name the suffix in the property namespace
since bucket names can contain the '.' character.



> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> S3a now supports different regions, by way of declaring the endpoint —but you can't
do things like read in one region, write back in another (e.g. a distcp backup), because only
one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt s3a://b2.seol
, then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, etc, in the
XML file. Would we need to do that much? It'd be simpler initially to use a domain suffix
of a URL to set the region of a bucket from the domain and have the aws library sort the details
out itself, maybe with some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message