hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration
Date Thu, 05 Jan 2017 16:11:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801750#comment-15801750
] 

Steve Loughran commented on HADOOP-13336:
-----------------------------------------

I'm about to do this; I'll pull it into the s3guard branch first, so that we can tune it there,
where the issues related to different dynamodb policies surface.

Proposed:
# Ignore the issue of bucket names with "." in them. There's already an assumption in the
code that hostname==bucket name.
# then use the bucket name as an fs3.s3a.bucket.* prefix
# falling back to fs.s3a. * for properties that aren't in a bucket

What I like about this is that a list of all properties and grep fs.s3a.bucket.mybucket will
find all custom settings for a given bucket. Do it trailing and your management tooling/text
editor has a harder time identifying what has changed. I think it gives us more option extensibility
too.

I'll add an extended configuration class which does the lookup and fallback, *for all the
methods currently in use in S3A*. That is, getTrimmed, getPassword, getBytes, ... Self contained,
with its own tests. Then move s3A and things it creates (output streams, ...) to it. The S3aFS
would create the config from its config & bucket name, pass that instance down to the
other things instead of a Configuration.

> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> S3a now supports different regions, by way of declaring the endpoint —but you can't
do things like read in one region, write back in another (e.g. a distcp backup), because only
one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt s3a://b2.seol
, then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, etc, in the
XML file. Would we need to do that much? It'd be simpler initially to use a domain suffix
of a URL to set the region of a bucket from the domain and have the aws library sort the details
out itself, maybe with some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message