hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Reduce Hangs at 66%
Date Thu, 03 May 2012 09:02:54 GMT
Well...
Lots of information but still missing some of the basics...

Which release and version?
What are your ulimits set to?
How much free disk space do you have?
What are you attempting to do?

Stuff like that.



Sent from a remote device. Please excuse any typos...

Mike Segel

On May 2, 2012, at 4:49 PM, Keith Thompson <kthomps6@binghamton.edu> wrote:

> I am running a task which gets to 66% of the Reduce step and then hangs
> indefinitely. Here is the log file (I apologize if I am putting too much
> here but I am not exactly sure what is relevant):
> 
> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
> task_201202240659_6433_r_000000, for tracker
> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_201202240659_6433_m_000001_0' has completed
> task_201202240659_6433_m_000001 successfully.
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_0: Task
> attempt_201202240659_6432_r_000000_0 failed to report status for 1800
> seconds. Killing!
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_0'
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
> tip task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_0'
> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_1'
> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_2'
> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
> TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> Aborting job job_201202240659_6432
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> Killing job 'job_201202240659_6432'
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
> tip task_201202240659_6432_m_000002, for tracker
> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_3'
> 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_201202240659_6432_m_000002_0' has completed
> task_201202240659_6432_m_000002 successfully.
> 2012-05-02 17:01:22,443 INFO
> org.apache.hadoop.mapred.JobInProgress$JobSummary:
> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> File does not exist. Holder DFSClient_1931891231 does not have any
> open files.
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> 
>    at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>    at $Proxy4.addBlock(Unknown Source)
>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>    at $Proxy4.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
> not get block locations. Source file
> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
> - Aborting...
> 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
> Moving file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> to file:/var/log/hadoop/history/done
> 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
> Exception closing file
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> File does not exist. Holder DFSClient_1931891231 does not have any
> open files.
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> 
> 
> Can anyone explain to me what is causing the problem?
> 
> -- 
> *Keith Thompson*

Mime
View raw message