hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "AmazonS3" by MichaelStack
Date Thu, 08 Feb 2007 22:28:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by MichaelStack:

- = Running bulk copies in and out S3 =
+ = Running bulk copies in and out of S3 =
  Support for the S3 filesystem was added to the `${HADOOP_HOME}/bin/hadoop distcp` tool in
Hadoop 0.11.0 (See [https://issues.apache.org/jira/browse/HADOOP-862 HADOOP-862]).  The `distcp`
tool sets up a MapReduce job to run the copy.  Using `distcp`, a cluster of many members can
copy lots of data quickly.  The number of map tasks is calculated by counting the number of
files in the source: i.e. each map task is responsible for the copying one file.  Source and
target may refer to disparate filesystem types.  For example, source might refer to the local
filesystem or `hdfs` with `S3` as the target.

View raw message