Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD39D175F4 for ; Tue, 21 Apr 2015 12:09:58 +0000 (UTC) Received: (qmail 57410 invoked by uid 500); 21 Apr 2015 12:09:58 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 57347 invoked by uid 500); 21 Apr 2015 12:09:58 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 57336 invoked by uid 500); 21 Apr 2015 12:09:58 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 57333 invoked by uid 99); 21 Apr 2015 12:09:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2015 12:09:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for core-commits@hadoop.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2015 12:09:53 +0000 Received: from eos.apache.org (eos.apache.org [140.211.11.131]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 7570343EB0 for ; Tue, 21 Apr 2015 12:09:33 +0000 (UTC) Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id B5B5BD8F; Tue, 21 Apr 2015 12:09:32 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 21 Apr 2015 12:09:32 -0000 Message-ID: <20150421120932.39977.93283@eos.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22AmazonS3=22_by_SteveLoughran?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "AmazonS3" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/AmazonS3?action=3Ddiff&rev1=3D18&rev2=3D19 Comment: remove all content on configuring the S3 filesystems -point to the markdown= docs on github instead. =3D History =3D * The S3 block filesystem was introduced in Hadoop 0.10.0 ([[http://issu= es.apache.org/jira/browse/HADOOP-574|HADOOP-574]]). * The S3 native filesystem was introduced in Hadoop 0.18.0 ([[http://iss= ues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename support was = added in Hadoop 0.19.0 ([[https://issues.apache.org/jira/browse/HADOOP-3361= |HADOOP-3361]]). - * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were fo= und and fixed for later Hadoop versions[[https://issues.apache.org/jira/bro= wse/HADOOP-11571|HADOOP-11571]], so Hadoop-2.6.0's support of s3a must be c= onsidered an incomplete replacement for the s3n FS. + * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were fo= und and fixed for later Hadoop versions [[https://issues.apache.org/jira/br= owse/HADOOP-11571|HADOOP-11571]]. = - =3D Why you cannot use S3 as a replacement for HDFS =3D = + =3D Configuring and using the S3 filesystem support =3D + = + Consult the [[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/ha= doop-aws/src/site/markdown/tools/hadoop-aws/index.md|Latest Hadoop document= ation]] for the specifics on using any of the S3 clients. + = + = + =3D Important: you cannot use S3 as a replacement for HDFS =3D + = - You cannot use either of the S3 filesystem clients as a drop-in replaceme= nt for HDFS. Amazon S3 is an "object store" with + You cannot use any of the S3 filesystem clients as a drop-in replacement = for HDFS. Amazon S3 is an "object store" with * eventual consistency: changes made by one application (creation, updat= es and deletions) will not be visible until some undefined time. * s3n and s3a: non-atomic rename and delete operations. Renaming or dele= ting large directories takes time proportional to the number of entries -an= d visible to other processes during this time, and indeed, until the eventu= al consistency has been resolved. = S3 is not a filesystem. The Hadoop S3 filesystem bindings make it pretend= to be a filesystem, but it is not. It can act as a source of data, and as a destination -though in the latter case,= you must remember that the output may not be immediately visible. - = - =3D=3D Configuring to use s3/ s3n filesystems =3D=3D - = - Edit your `core-site.xml` file to include your S3 keys - = - {{{ - = - - fs.s3.awsAccessKeyId - ID - - = - - fs.s3.awsSecretAccessKey - SECRET - - }}} - = - You can then use URLs to your bucket : ``s3n://MYBUCKET/``, or directorie= s and files inside it. - = - {{{ - = - s3n://BUCKET/ - s3n://BUCKET/dir - s3n://BUCKET/dir/files.csv.tar.gz - s3n://BUCKET/dir/*.gz - = - }}} - = - Alternatively, you can put the access key ID and the secret access key in= to a ''s3n'' (or ''s3'') URI as the user info: - = - {{{ - s3n://ID:SECRET@BUCKET - }}} - = - Note that since the secret - access key can contain slashes, you must remember to escape them by repla= cing each slash `/` with the string `%2F`. - Keys specified in the URI take precedence over any specified using the pr= operties `fs.s3.awsAccessKeyId` and - `fs.s3.awsSecretAccessKey`. - = - This option is less secure as the URLs are likely to appear in output log= s and error messages, so being exposed to remote users. = =3D Security =3D =20