Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0221510CC0 for ; Mon, 15 Dec 2014 18:36:15 +0000 (UTC) Received: (qmail 63679 invoked by uid 500); 15 Dec 2014 18:35:54 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 62956 invoked by uid 500); 15 Dec 2014 18:35:54 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 61745 invoked by uid 99); 15 Dec 2014 18:35:53 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2014 18:35:53 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id A5A68A2BB65; Mon, 15 Dec 2014 18:35:53 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: kasha@apache.org To: common-commits@hadoop.apache.org Date: Mon, 15 Dec 2014 18:36:36 -0000 Message-Id: <514980f92ec14fa0b59fc1e21b1e099e@git.apache.org> In-Reply-To: <4f1fefaf8cb44792ae4248819d4c8eaf@git.apache.org> References: <4f1fefaf8cb44792ae4248819d4c8eaf@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [45/50] [abbrv] hadoop git commit: HADOOP-11394. hadoop-aws documentation missing. Contributed by Chris Nauroth. HADOOP-11394. hadoop-aws documentation missing. Contributed by Chris Nauroth. Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/9458cd5b Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/9458cd5b Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/9458cd5b Branch: refs/heads/YARN-2139 Commit: 9458cd5bce20358e31c0cfb594bc545c7824b10d Parents: 0e37bbc Author: cnauroth Authored: Fri Dec 12 23:29:11 2014 -0800 Committer: cnauroth Committed: Fri Dec 12 23:29:11 2014 -0800 ---------------------------------------------------------------------- hadoop-common-project/hadoop-common/CHANGES.txt | 2 + .../site/markdown/tools/hadoop-aws/index.md | 417 ------------------- .../src/site/markdown/tools/hadoop-aws/index.md | 417 +++++++++++++++++++ 3 files changed, 419 insertions(+), 417 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/9458cd5b/hadoop-common-project/hadoop-common/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt index 1e59395..729a456 100644 --- a/hadoop-common-project/hadoop-common/CHANGES.txt +++ b/hadoop-common-project/hadoop-common/CHANGES.txt @@ -580,6 +580,8 @@ Release 2.7.0 - UNRELEASED HADOOP-11389. Clean up byte to string encoding issues in hadoop-common. (wheat9) + + HADOOP-11394. hadoop-aws documentation missing. (cnauroth) Release 2.6.0 - 2014-11-18 http://git-wip-us.apache.org/repos/asf/hadoop/blob/9458cd5b/hadoop-tools/hadoop-aws/src/main/site/markdown/tools/hadoop-aws/index.md ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-aws/src/main/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/main/site/markdown/tools/hadoop-aws/index.md deleted file mode 100644 index 4a1956a..0000000 --- a/hadoop-tools/hadoop-aws/src/main/site/markdown/tools/hadoop-aws/index.md +++ /dev/null @@ -1,417 +0,0 @@ - - -# Hadoop-AWS module: Integration with Amazon Web Services - -The `hadoop-aws` module provides support for AWS integration. The generated -JAR file, `hadoop-aws.jar` also declares a transitive dependency on all -external artifacts which are needed for this support —enabling downstream -applications to easily use this support. - -Features - -1. The "classic" `s3:` filesystem for storing objects in Amazon S3 Storage -1. The second-generation, `s3n:` filesystem, making it easy to share -data between hadoop and other applications via the S3 object store -1. The third generation, `s3a:` filesystem. Designed to be a switch in -replacement for `s3n:`, this filesystem binding supports larger files and promises -higher performance. - -The specifics of using these filesystems are documented below. - -## Warning: Object Stores are not filesystems. - -Amazon S3 is an example of "an object store". In order to achieve scalalablity -and especially high availability, S3 has —as many other cloud object stores have -done— relaxed some of the constraints which classic "POSIX" filesystems promise. - -Specifically - -1. Files that are newly created from the Hadoop Filesystem APIs may not be -immediately visible. -2. File delete and update operations may not immediately propagate. Old -copies of the file may exist for an indeterminate time period. -3. Directory operations: `delete()` and `rename()` are implemented by -recursive file-by-file operations. They take time at least proportional to -the number of files, during which time partial updates may be visible. If -the operations are interrupted, the filesystem is left in an intermediate state. - -For further discussion on these topics, please consult -[/filesystem](The Hadoop FileSystem API Definition). - -## Warning #2: your AWS credentials are valuable - -Your AWS credentials not only pay for services, they offer read and write -access to the data. Anyone with the credentials can not only read your datasets -—they can delete them. - -Do not inadvertently share these credentials through means such as -1. Checking in Hadoop configuration files containing the credentials. -1. Logging them to a console, as they invariably end up being seen. - -If you do any of these: change your credentials immediately! - - -## S3 - -### Authentication properties - - - fs.s3.awsAccessKeyId - AWS access key ID - - - - fs.s3.awsSecretAccessKey - AWS secret key - - - -## S3N - -### Authentication properties - - - fs.s3n.awsAccessKeyId - AWS access key ID - - - - fs.s3n.awsSecretAccessKey - AWS secret key - - -### Other properties - - - - fs.s3n.block.size - 67108864 - Block size to use when reading files using the native S3 - filesystem (s3n: URIs). - - - - fs.s3n.multipart.uploads.enabled - false - Setting this property to true enables multiple uploads to - native S3 filesystem. When uploading a file, it is split into blocks - if the size is larger than fs.s3n.multipart.uploads.block.size. - - - - - fs.s3n.multipart.uploads.block.size - 67108864 - The block size for multipart uploads to native S3 filesystem. - Default size is 64MB. - - - - - fs.s3n.multipart.copy.block.size - 5368709120 - The block size for multipart copy in native S3 filesystem. - Default size is 5GB. - - - - - fs.s3n.server-side-encryption-algorithm - - Specify a server-side encryption algorithm for S3. - The default is NULL, and the only other currently allowable value is AES256. - - - -## S3A - - -### Authentication properties - - - fs.s3a.awsAccessKeyId - AWS access key ID. Omit for Role-based authentication. - - - - fs.s3a.awsSecretAccessKey - AWS secret key. Omit for Role-based authentication. - - -### Other properties - - - fs.s3a.connection.maximum - 15 - Controls the maximum number of simultaneous connections to S3. - - - - fs.s3a.connection.ssl.enabled - true - Enables or disables SSL connections to S3. - - - - fs.s3a.attempts.maximum - 10 - How many times we should retry commands on transient errors. - - - - fs.s3a.connection.timeout - 5000 - Socket connection timeout in seconds. - - - - fs.s3a.paging.maximum - 5000 - How many keys to request from S3 when doing - directory listings at a time. - - - - fs.s3a.multipart.size - 104857600 - How big (in bytes) to split upload or copy operations up into. - - - - fs.s3a.multipart.threshold - 2147483647 - Threshold before uploads or copies use parallel multipart operations. - - - - fs.s3a.acl.default - Set a canned ACL for newly created and copied objects. Value may be private, - public-read, public-read-write, authenticated-read, log-delivery-write, - bucket-owner-read, or bucket-owner-full-control. - - - - fs.s3a.multipart.purge - false - True if you want to purge existing multipart uploads that may not have been - completed/aborted correctly - - - - fs.s3a.multipart.purge.age - 86400 - Minimum age in seconds of multipart uploads to purge - - - - fs.s3a.buffer.dir - ${hadoop.tmp.dir}/s3a - Comma separated list of directories that will be used to buffer file - uploads to. - - - - fs.s3a.impl - org.apache.hadoop.fs.s3a.S3AFileSystem - The implementation class of the S3A Filesystem - - - -## Testing the S3 filesystem clients - -To test the S3* filesystem clients, you need to provide two files -which pass in authentication details to the test runner - -1. `auth-keys.xml` -1. `core-site.xml` - -These are both Hadoop XML configuration files, which must be placed into -`hadoop-tools/hadoop-aws/src/test/resources`. - - -### `auth-keys.xml` - -The presence of this file triggers the testing of the S3 classes. - -Without this file, *none of the tests in this module will be executed* - -The XML file must contain all the ID/key information needed to connect -each of the filesystem clients to the object stores, and a URL for -each filesystem for its testing. - -1. `test.fs.s3n.name` : the URL of the bucket for S3n tests -1. `test.fs.s3a.name` : the URL of the bucket for S3a tests -2. `test.fs.s3.name` : the URL of the bucket for "S3" tests - -The contents of each bucket will be destroyed during the test process: -do not use the bucket for any purpose other than testing. - -Example: - - - - - test.fs.s3n.name - s3n://test-aws-s3n/ - - - - test.fs.s3a.name - s3a://test-aws-s3a/ - - - - test.fs.s3.name - s3a://test-aws-s3/ - - - - fs.s3.awsAccessKeyId - DONOTPCOMMITTHISKEYTOSCM - - - - fs.s3.awsSecretAccessKey - DONOTEVERSHARETHISSECRETKEY! - - - - fs.s3n.awsAccessKeyId - DONOTPCOMMITTHISKEYTOSCM - - - - fs.s3n.awsSecretAccessKey - DONOTEVERSHARETHISSECRETKEY! - - - - fs.s3a.awsAccessKeyId - AWS access key ID. Omit for Role-based authentication. - DONOTPCOMMITTHISKEYTOSCM - - - - fs.s3a.awsSecretAccessKey - AWS secret key. Omit for Role-based authentication. - DONOTEVERSHARETHISSECRETKEY! - - - -## File `contract-test-options.xml` - -The file `hadoop-tools/hadoop-aws/src/test/resources/contract-test-options.xml` -must be created and configured for the test fileystems. - -If a specific file `fs.contract.test.fs.*` test path is not defined for -any of the filesystems, those tests will be skipped. - -The standard S3 authentication details must also be provided. This can be -through copy-and-paste of the `auth-keys.xml` credentials, or it can be -through direct XInclude inclustion. - -#### s3:// - -The filesystem name must be defined in the property `fs.contract.test.fs.s3`. - - -Example: - - - fs.contract.test.fs.s3 - s3://test-aws-s3/ - - -### s3n:// - - -In the file `src/test/resources/contract-test-options.xml`, the filesystem -name must be defined in the property `fs.contract.test.fs.s3n`. -The standard configuration options to define the S3N authentication details -must also be provided. - -Example: - - - fs.contract.test.fs.s3n - s3n://test-aws-s3n/ - - -### s3a:// - - -In the file `src/test/resources/contract-test-options.xml`, the filesystem -name must be defined in the property `fs.contract.test.fs.s3a`. -The standard configuration options to define the S3N authentication details -must also be provided. - -Example: - - - fs.contract.test.fs.s3a - s3a://test-aws-s3a/ - - -### Complete example of `contract-test-options.xml` - - - - - - - - - - - - - fs.contract.test.fs.s3 - s3://test-aws-s3/ - - - - - fs.contract.test.fs.s3a - s3a://test-aws-s3a/ - - - - fs.contract.test.fs.s3n - s3n://test-aws-s3n/ - - - - -This example pulls in the `auth-keys.xml` file for the credentials. -This provides one single place to keep the keys up to date —and means -that the file `contract-test-options.xml` does not contain any -secret credentials itself. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/hadoop/blob/9458cd5b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md new file mode 100644 index 0000000..d443389 --- /dev/null +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md @@ -0,0 +1,417 @@ + + +# Hadoop-AWS module: Integration with Amazon Web Services + +The `hadoop-aws` module provides support for AWS integration. The generated +JAR file, `hadoop-aws.jar` also declares a transitive dependency on all +external artifacts which are needed for this support —enabling downstream +applications to easily use this support. + +Features + +1. The "classic" `s3:` filesystem for storing objects in Amazon S3 Storage +1. The second-generation, `s3n:` filesystem, making it easy to share +data between hadoop and other applications via the S3 object store +1. The third generation, `s3a:` filesystem. Designed to be a switch in +replacement for `s3n:`, this filesystem binding supports larger files and promises +higher performance. + +The specifics of using these filesystems are documented below. + +## Warning: Object Stores are not filesystems. + +Amazon S3 is an example of "an object store". In order to achieve scalalablity +and especially high availability, S3 has —as many other cloud object stores have +done— relaxed some of the constraints which classic "POSIX" filesystems promise. + +Specifically + +1. Files that are newly created from the Hadoop Filesystem APIs may not be +immediately visible. +2. File delete and update operations may not immediately propagate. Old +copies of the file may exist for an indeterminate time period. +3. Directory operations: `delete()` and `rename()` are implemented by +recursive file-by-file operations. They take time at least proportional to +the number of files, during which time partial updates may be visible. If +the operations are interrupted, the filesystem is left in an intermediate state. + +For further discussion on these topics, please consult +[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html). + +## Warning #2: your AWS credentials are valuable + +Your AWS credentials not only pay for services, they offer read and write +access to the data. Anyone with the credentials can not only read your datasets +—they can delete them. + +Do not inadvertently share these credentials through means such as +1. Checking in Hadoop configuration files containing the credentials. +1. Logging them to a console, as they invariably end up being seen. + +If you do any of these: change your credentials immediately! + + +## S3 + +### Authentication properties + + + fs.s3.awsAccessKeyId + AWS access key ID + + + + fs.s3.awsSecretAccessKey + AWS secret key + + + +## S3N + +### Authentication properties + + + fs.s3n.awsAccessKeyId + AWS access key ID + + + + fs.s3n.awsSecretAccessKey + AWS secret key + + +### Other properties + + + + fs.s3n.block.size + 67108864 + Block size to use when reading files using the native S3 + filesystem (s3n: URIs). + + + + fs.s3n.multipart.uploads.enabled + false + Setting this property to true enables multiple uploads to + native S3 filesystem. When uploading a file, it is split into blocks + if the size is larger than fs.s3n.multipart.uploads.block.size. + + + + + fs.s3n.multipart.uploads.block.size + 67108864 + The block size for multipart uploads to native S3 filesystem. + Default size is 64MB. + + + + + fs.s3n.multipart.copy.block.size + 5368709120 + The block size for multipart copy in native S3 filesystem. + Default size is 5GB. + + + + + fs.s3n.server-side-encryption-algorithm + + Specify a server-side encryption algorithm for S3. + The default is NULL, and the only other currently allowable value is AES256. + + + +## S3A + + +### Authentication properties + + + fs.s3a.awsAccessKeyId + AWS access key ID. Omit for Role-based authentication. + + + + fs.s3a.awsSecretAccessKey + AWS secret key. Omit for Role-based authentication. + + +### Other properties + + + fs.s3a.connection.maximum + 15 + Controls the maximum number of simultaneous connections to S3. + + + + fs.s3a.connection.ssl.enabled + true + Enables or disables SSL connections to S3. + + + + fs.s3a.attempts.maximum + 10 + How many times we should retry commands on transient errors. + + + + fs.s3a.connection.timeout + 5000 + Socket connection timeout in seconds. + + + + fs.s3a.paging.maximum + 5000 + How many keys to request from S3 when doing + directory listings at a time. + + + + fs.s3a.multipart.size + 104857600 + How big (in bytes) to split upload or copy operations up into. + + + + fs.s3a.multipart.threshold + 2147483647 + Threshold before uploads or copies use parallel multipart operations. + + + + fs.s3a.acl.default + Set a canned ACL for newly created and copied objects. Value may be private, + public-read, public-read-write, authenticated-read, log-delivery-write, + bucket-owner-read, or bucket-owner-full-control. + + + + fs.s3a.multipart.purge + false + True if you want to purge existing multipart uploads that may not have been + completed/aborted correctly + + + + fs.s3a.multipart.purge.age + 86400 + Minimum age in seconds of multipart uploads to purge + + + + fs.s3a.buffer.dir + ${hadoop.tmp.dir}/s3a + Comma separated list of directories that will be used to buffer file + uploads to. + + + + fs.s3a.impl + org.apache.hadoop.fs.s3a.S3AFileSystem + The implementation class of the S3A Filesystem + + + +## Testing the S3 filesystem clients + +To test the S3* filesystem clients, you need to provide two files +which pass in authentication details to the test runner + +1. `auth-keys.xml` +1. `core-site.xml` + +These are both Hadoop XML configuration files, which must be placed into +`hadoop-tools/hadoop-aws/src/test/resources`. + + +### `auth-keys.xml` + +The presence of this file triggers the testing of the S3 classes. + +Without this file, *none of the tests in this module will be executed* + +The XML file must contain all the ID/key information needed to connect +each of the filesystem clients to the object stores, and a URL for +each filesystem for its testing. + +1. `test.fs.s3n.name` : the URL of the bucket for S3n tests +1. `test.fs.s3a.name` : the URL of the bucket for S3a tests +2. `test.fs.s3.name` : the URL of the bucket for "S3" tests + +The contents of each bucket will be destroyed during the test process: +do not use the bucket for any purpose other than testing. + +Example: + + + + + test.fs.s3n.name + s3n://test-aws-s3n/ + + + + test.fs.s3a.name + s3a://test-aws-s3a/ + + + + test.fs.s3.name + s3a://test-aws-s3/ + + + + fs.s3.awsAccessKeyId + DONOTPCOMMITTHISKEYTOSCM + + + + fs.s3.awsSecretAccessKey + DONOTEVERSHARETHISSECRETKEY! + + + + fs.s3n.awsAccessKeyId + DONOTPCOMMITTHISKEYTOSCM + + + + fs.s3n.awsSecretAccessKey + DONOTEVERSHARETHISSECRETKEY! + + + + fs.s3a.awsAccessKeyId + AWS access key ID. Omit for Role-based authentication. + DONOTPCOMMITTHISKEYTOSCM + + + + fs.s3a.awsSecretAccessKey + AWS secret key. Omit for Role-based authentication. + DONOTEVERSHARETHISSECRETKEY! + + + +## File `contract-test-options.xml` + +The file `hadoop-tools/hadoop-aws/src/test/resources/contract-test-options.xml` +must be created and configured for the test fileystems. + +If a specific file `fs.contract.test.fs.*` test path is not defined for +any of the filesystems, those tests will be skipped. + +The standard S3 authentication details must also be provided. This can be +through copy-and-paste of the `auth-keys.xml` credentials, or it can be +through direct XInclude inclustion. + +#### s3:// + +The filesystem name must be defined in the property `fs.contract.test.fs.s3`. + + +Example: + + + fs.contract.test.fs.s3 + s3://test-aws-s3/ + + +### s3n:// + + +In the file `src/test/resources/contract-test-options.xml`, the filesystem +name must be defined in the property `fs.contract.test.fs.s3n`. +The standard configuration options to define the S3N authentication details +must also be provided. + +Example: + + + fs.contract.test.fs.s3n + s3n://test-aws-s3n/ + + +### s3a:// + + +In the file `src/test/resources/contract-test-options.xml`, the filesystem +name must be defined in the property `fs.contract.test.fs.s3a`. +The standard configuration options to define the S3N authentication details +must also be provided. + +Example: + + + fs.contract.test.fs.s3a + s3a://test-aws-s3a/ + + +### Complete example of `contract-test-options.xml` + + + + + + + + + + + + + fs.contract.test.fs.s3 + s3://test-aws-s3/ + + + + + fs.contract.test.fs.s3a + s3a://test-aws-s3a/ + + + + fs.contract.test.fs.s3n + s3n://test-aws-s3n/ + + + + +This example pulls in the `auth-keys.xml` file for the credentials. +This provides one single place to keep the keys up to date —and means +that the file `contract-test-options.xml` does not contain any +secret credentials itself. \ No newline at end of file