ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chadwick Banning <chadwickbann...@gmail.com>
Subject Hadoop 2.2.0 Distcp errors
Date Fri, 17 Jan 2014 01:05:13 GMT
Hi all,

I'm trying to distcp files between two Ambari1.4.2/HDP2.0.6 clusters with
NN HA enabled.  The distcp begins the copy but errors start cropping up and
eventually the job fails.  This happens over and over again, with the job
completion percentage increasing a few points each time.

Here is what the error looks like.  I've tried using the -update and
-skipcrccheck flags but the issue still occurs.  Are there any properties
that might need to be increased?  I tried increasing the
mapred.task.attempts but this did not help.

14/01/16 16:07:14 INFO mapreduce.Job: Task Id :
> attempt_1389884601733_0182_m_000017_2, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://source/file -->
> hdfs://destination/file
>         at org.apache.hadoop.tools.mapred.CopyMapper.
> copyFileWithRetry(CopyMapper.java:262)
>         at org.apache.hadoop.tools.mapred.CopyMapper.map(
> CopyMapper.java:229)
>         at org.apache.hadoop.tools.mapred.CopyMapper.map(
> CopyMapper.java:45)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1491)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying
> hdfs://source/file to hdfs://destination/file
>         at org.apache.hadoop.tools.util.RetriableCommand.execute(
> RetriableCommand.java:101)
>         at org.apache.hadoop.tools.mapred.CopyMapper.
> copyFileWithRetry(CopyMapper.java:258)
>         ... 10 more
> Caused by: java.io.IOException: Check-sum mismatch between
> hdfs://source/file and hdfs://destination/.distcp.
> tmp.attempt_1389884601733_0182_m_000017_2.
>         at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.
> compareCheckSums(RetriableFileCopyCommand.java:152)
>         at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(
> RetriableFileCopyCommand.java:108)
>         at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.
> doExecute(RetriableFileCopyCommand.java:83)
>         at org.apache.hadoop.tools.util.RetriableCommand.execute(
> RetriableCommand.java:87)
>         ... 11 more
> 14/01/16 16:07:51 INFO mapreduce.Job:  map 100% reduce 0%
> 14/01/16 16:07:53 INFO mapreduce.Job: Job job_1389884601733_0182 failed
> with state FAILED due to: Task failed task_1389884601733_0182_m_000019
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 14/01/16 16:07:54 INFO mapreduce.Job: Counters: 6
>         Job Counters
>                 Failed map tasks=20
>                 Killed map tasks=18
>                 Launched map tasks=38
>                 Other local map tasks=38
>                 Total time spent by all maps in occupied slots
> (ms)=35208594
>                 Total time spent by all reduces in occupied slots (ms)=0
> 14/01/16 16:07:54 ERROR tools.DistCp: Exception encountered
> java.io.IOException: DistCp failure: Job job_1389884601733_0182 has
> failed: Task failed task_1389884601733_0182_m_000019
> Job failed as tasks failed. failedMaps:1 failedReduces:0
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:166)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:375)

Mime
View raw message