hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration
Date Fri, 06 Jan 2017 20:04:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805590#comment-15805590
] 

Aaron Fabbri commented on HADOOP-13336:
---------------------------------------

+1 (non-binding) once you remove the change of debug log level at end of patch.

I like this approach.  I had thought about this a bit before you started work and was thinking
of having a "flatten" (a.k.a. propagate) function as you do here.  Seems more efficient at
runtime and easier to debug.  I like the use of the Configuration source feature too.

It looks like your omission of <prefix>.impl keys still works with fs.s3a.metadatastore.impl,
since {{"impl".equals("metadatastore.impl") == false}}.  (You compare the whole key, not just
the suffix.)  This is good because we'd like to be able to enable/disable s3guard on a per-bucket
basis.

Other patch comments:

{code}
+  public static Configuration propagateBucketOptions(Configuration source,
+      String bucket) {
+
<snip>
+    for (Map.Entry<String, String> entry : source) {
+      final String key = entry.getKey();
+      // get the (unexpanded) value.
+      final String value = entry.getValue();
+      if (!key.startsWith(bucketPrefix)
+          || bucketPrefix.equals(key)
+          || value == null) {
+        continue;
+      }
{code}

Was curious about the {{value == null}} part.. Does that ever happen?  Anyhow, seems safe
to include the check.

Minor nit in the docs:

{code}
+Different S3 buckets can be accessed with S3A client configurations.
{code}

Should this read "can be accessed with different S3A client configurations"?

{code}
--- a/hadoop-tools/hadoop-aws/src/test/resources/log4j.properties
+++ b/hadoop-tools/hadoop-aws/src/test/resources/log4j.properties
@@ -21,7 +21,7 @@ log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
  
 # for debugging low level S3a operations, uncomment these lines
 # Log all S3A classes
-#log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
+log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
  
 # Log S3Guard classes
 #log4j.logger.org.apache.hadoop.fs.s3a.s3guard=DEBUG
{code}
Was this intentional?

> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13336-HADOOP-13345-001.patch, HADOOP-13336-HADOOP-13345-002.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you can't
do things like read in one region, write back in another (e.g. a distcp backup), because only
one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt s3a://b2.seol
, then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, etc, in the
XML file. Would we need to do that much? It'd be simpler initially to use a domain suffix
of a URL to set the region of a bucket from the domain and have the aws library sort the details
out itself, maybe with some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message