hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net>
Subject Re: copy files from ftp to hdfs in parallel, distcp failed
Date Thu, 11 Jul 2013 18:47:30 GMT
On 11 July 2013 06:27, Hao Ren <h.ren@claravista.fr> wrote:

> Hi,
>
> I am running a hdfs on Amazon EC2
>
> Say, I have a ftp server where stores some data.
>

I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
>
> I think hadoop distcp is what I need.
>

http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting


I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji

Mime
View raw message