hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daryn Sharp <da...@yahoo-inc.com>
Subject Re: Why failed to use Distcp over FTP protocol?
Date Tue, 23 Apr 2013 17:38:00 GMT
The ftp fs is listing the contents of the given path's parent directory, and then trying to
match the basename of each child path returned against the basename of the given path –
quite inefficient…  The FNF is it didn't find a match for the basename.  It may be that
the ftp server isn't returning a listing in exactly the expected format so it's being parsed
incorrectly.

Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here" work?  Or "hadoop
fs -rm ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should exercise the
same code paths where you are experiencing errors.

Daryn

On Apr 22, 2013, at 9:06 PM, sam liu wrote:

I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_
000000_0, Status : FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary
doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after
reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed
limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu
does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu <samliuhadoop@gmail.com<mailto:samliuhadoop@gmail.com>>
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id : attempt_201304160910_2135_m_000000_0, Status
: FAILED
java.io.IOException: The temporary job-output directory ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporary
doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after
reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed
limit. FailedCount: 1. LastFailedTask: task_201304160910_2135_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu
does not exist.
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
    at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
    at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 Daryn Sharp <daryn@yahoo-inc.com<mailto:daryn@yahoo-inc.com>>
I believe it should work…  What error message did you receive?

Daryn

On Apr 22, 2013, at 3:45 AM, sam liu wrote:

> Hi Experts,
>
> I failed to execute following command, does not Distcp support FTP protocol?
>
> hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
> hdfs:///tmp/file1.txt
>
> Thanks!





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message