hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput
Date Thu, 28 Mar 2013 07:45:12 GMT
The EMR distributions have special versions of the s3 file system.  They
might be helpful here.

Of course, you likely aren't running those if you are seeing 5MB/s.

An extreme alternative would be to light up an EMR cluster, copy to it,
then to S3.

On Thu, Mar 28, 2013 at 4:54 AM, Himanish Kushary <himanish@gmail.com>wrote:

> I am thinking either transferring individual folders instead of the entire
> 70 GB folders as a workaround or as another option increasing the "
> mapred.task.timeout" parameter to something like 6-7 hour ( as the avg
> rate of transfer to S3 seems to be 5 MB/s).Is there any other better
> option to increase the throughput for transferring bulk data from HDFS to
> S3 ?  Looking forward for suggestions.

View raw message