hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "AmazonS3" by SteveLoughran
Date Tue, 29 Jan 2013 00:07:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "AmazonS3" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/AmazonS3?action=diff&rev1=13&rev2=14

Comment:
remind people to keep their access key safe and out of SCM & logs

  
   S3 Native FileSystem (URI scheme: s3n)::
   A native filesystem for reading and writing regular files on S3. The advantage of this
filesystem is that you can access files on S3 that were written with other tools. Conversely,
other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file
size imposed by S3. For this reason it is not suitable as a replacement for HDFS (which has
support for very large files).
-  
+ 
   S3 Block FileSystem (URI scheme: s3)::
   A block-based filesystem backed by S3. Files are stored as blocks, just like they are in
HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate
a bucket for the filesystem - you should not use an existing bucket containing files, or write
other files to the same bucket. The files stored by this filesystem can be larger than 5GB,
but they are not interoperable with other S3 tools.
  
  There are two ways that S3 can be used with Hadoop's Map/Reduce, either as a replacement
for HDFS using the S3 block filesystem
  (i.e. using it as a reliable distributed filesystem with support for very large files)
  or as a convenient repository for data input to and output from MapReduce, using either
S3 filesystem. In the second case
- HDFS is still used for the Map/Reduce phase. Note also, that by using S3 as an input to
MapReduce you lose the data locality optimization, which may be significant. 
+ HDFS is still used for the Map/Reduce phase. Note also, that by using S3 as an input to
MapReduce you lose the data locality optimization, which may be significant.
  
  = History =
   * The S3 block filesystem was introduced in Hadoop 0.10.0 ([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]),
but this had a few bugs so you should use Hadoop 0.10.1 or later.
@@ -79, +79 @@

  bin/start-mapred.sh
  }}}
  
+ = Security =
+ 
+ Your Amazon Secret Access Key is that: secret. If it gets known you have to go to the [[https://portal.aws.amazon.com/gp/aws/securityCredentials|Security
Credentials]] page and revoke it. Try and avoid printing it in logs, or checking the XML configuration
files into revision control.
  
  = Running bulk copies in and out of S3 =
  
@@ -105, +108 @@

  put: Input stream is not repeatable as 1048576 bytes have been written, exceeding the available
buffer size of 131072
  }}}
  
- See [[https://issues.apache.org/jira/browse/HADOOP-882|HADOOP-882]] for discussion of the
above issues and workarounds/fixes.   
+ See [[https://issues.apache.org/jira/browse/HADOOP-882|HADOOP-882]] for discussion of the
above issues and workarounds/fixes.
  
  = S3 Block FileSystem Version Numbers =
  From release 0.13.0 the S3 block filesystem stores a version number in the file metadata.
This table lists the first Hadoop release for each version number.

Mime
View raw message