hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Ren <h....@claravista.fr>
Subject copy files from ftp to hdfs in parallel, distcp failed
Date Thu, 11 Jul 2013 13:27:09 GMT
Hi,

I am running a hdfs on Amazon EC2

Say, I have a ftp server where stores some data.

I just want to copy these data directly to hdfs in a parallel way (which 
maybe more efficient).

I think hadoop distcp is what I need.

But

     $ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ 
hdfs://namenode/some/path

doesn't work.

     13/07/05 16:13:46 INFO tools.DistCp: 
srcPaths=[ftp://username:passwd@hostname/some/path/]
     13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path
     Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input 
source ftp://username:passwd@hostname/some/path/ does not exist.
     at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

I checked the path by copying the ftp path in Chrome , and the file 
really exists, I can even download it.

And then, I tried to list the files under the path by:

     $ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/

It ends with:

     ls: Cannot access ftp://username:passwd@hostname/some/path/: No 
such file or directory.

That seems the same pb.

Any workaround here ?

Thank you in advance.

Hao.

-- 
Hao Ren
ClaraVista
www.claravista.fr

Mime
View raw message