Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EF2CCFB5 for ; Thu, 11 Jul 2013 13:28:06 +0000 (UTC) Received: (qmail 29135 invoked by uid 500); 11 Jul 2013 13:28:01 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 29027 invoked by uid 500); 11 Jul 2013 13:28:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 29019 invoked by uid 99); 11 Jul 2013 13:28:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 13:28:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [93.17.128.118] (HELO smtp25.services.sfr.fr) (93.17.128.118) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 13:27:53 +0000 Received: from filter.sfr.fr (localhost [127.0.0.1]) by msfrf2504.sfr.fr (SMTP Server) with ESMTP id 2D0D470000AF for ; Thu, 11 Jul 2013 15:27:12 +0200 (CEST) Received: from [192.168.1.52] (19.129.29.109.rev.sfr.net [109.29.129.19]) by msfrf2504.sfr.fr (SMTP Server) with ESMTP id D51E67000094 for ; Thu, 11 Jul 2013 15:27:11 +0200 (CEST) X-SFR-UUID: 20130711132711873.D51E67000094@msfrf2504.sfr.fr Message-ID: <51DEB2AD.1080607@claravista.fr> Date: Thu, 11 Jul 2013 15:27:09 +0200 From: Hao Ren User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: copy files from ftp to hdfs in parallel, distcp failed Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need. But $ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ hdfs://namenode/some/path doesn't work. 13/07/05 16:13:46 INFO tools.DistCp: srcPaths=[ftp://username:passwd@hostname/some/path/] 13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source ftp://username:passwd@hostname/some/path/ does not exist. at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) I checked the path by copying the ftp path in Chrome , and the file really exists, I can even download it. And then, I tried to list the files under the path by: $ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/ It ends with: ls: Cannot access ftp://username:passwd@hostname/some/path/: No such file or directory. That seems the same pb. Any workaround here ? Thank you in advance. Hao. -- Hao Ren ClaraVista www.claravista.fr