hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Montgomery (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12963) Allow using path style addressing for accessing the s3 endpoint
Date Sat, 26 Mar 2016 23:21:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213244#comment-15213244
] 

Stephen Montgomery commented on HADOOP-12963:
---------------------------------------------

Hi Steve,
Thanks for quick reply. This patch is simply setting a flag on the Amazon S3 Client to use
the path style access behaviour by default instead of virtual hosting - see com.amazonaws.services.s3.S3ClientOptions.
This is done when the S3AFileSystem initialises the AmazonS3Client. JetS3t has a similar property
to do this as well - see s3service.disable-dns-buckets at http://www.jets3t.org/toolkit/configuration.html.


I submitted a test (org.apache.hadoop.fs.s3a.TestS3AConfiguration.shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty)
that simply sets the new Hadoop flag, initialises the new S3AFileSystem and checks that in
the newly instantiated AmazonS3Client that it's S3ClientOptions.isPathStyleAccess() is set
to true. The S3ClientOptions property interrogation is done via ugly reflection as the property
is not retrievable via Amazon S3 SDK. 

When the test runs against "live" S3A buckets, and the path style access switched on, the
buckets have be created in the same region as the AmazonS3Client (with default s3.amazonaws.com
endpoint specified) otherwise a 301 error thrown (see http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html)
which is in the test as well. 

I have patched a running cluster and submitted client jobs using the new flag and it works
as expected - it removed the need to have all the virtual hosted buckets specified in the
/etc/hosts file. I have also done manual tests specifying the region.amazonaws.com as custom
S3A endpoint to bypass the 301 error when I have buckets in different regions. I also used
an IPv4 address as the custom S3A endpoint that is a known workaround to switch on path style
access in the AmazonS3Client code itself.

I could have written a few more tests, maybe creating new buckets on the fly in different
regions to test for the 301 error but I don't know if this error code is specific to AWS S3
only (and never going to change). The actual AWS S3A operations behaviour doesn't vary when
virtual hosting or path style access used. But the upshot is that I'm just setting a flag
on the AmazonS3Client instance creation and that small 3 liner probably (!?!) doesn't warrant
100 lines or so of junit code. If you think it does though, I'll go ahead and do it... 

Thanks,
Stephen

> Allow using path style addressing for accessing the s3 endpoint
> ---------------------------------------------------------------
>
>                 Key: HADOOP-12963
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12963
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.7.1
>            Reporter: Andrew Baptist
>            Priority: Minor
>              Labels: features
>         Attachments: HADOOP-12963-001.patch, HADOOP-12963-1.patch, hdfs-8728.patch.2
>
>
> There is no ability to specify using path style access for the s3 endpoint. There are
numerous non-amazon implementations of storage that support the amazon API's but only support
path style access such as Cleversafe and Ceph. Additionally in many environments it is difficult
to configure DNS correctly to get virtual host style addressing to work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message