hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng (Jeff) Zhang" <jzh...@hortonworks.com>
Subject Re: Application Master waits a long time after Mapper/Reducers finish
Date Mon, 20 Jul 2015 16:55:07 GMT
Might due to performance issue of FileOutputCommitter which is resolved in 2.7
https://issues.apache.org/jira/browse/MAPREDUCE-4815


Best Regard,
Jeff Zhang


From: Ashish Kumar Singh <ashish23aks@gmail.com<mailto:ashish23aks@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Monday, July 20, 2015 at 4:06 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Application Master waits a long time after Mapper/Reducers finish

Hi Rohit ,

Thanks for replying .
No , I do not see any connection retry attempts to HDFS in the logs .

Also , Namenode and HDFS look healthy in our cluster .

PFA latest AM logs for the job  .


Regards,
Ashish


On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <rohithsharmaks@huawei.com<mailto:rohithsharmaks@huawei.com>>
wrote:
Hi

>From thread dump, it seems waiting for HDFS operation.  Can you attach AM logs, and do
you see any client retry for connecting to HDFS?

"CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000]
   java.lang.Thread.State: WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at java.lang.Object.wait(Object.java:503)
                ............................
                at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
                at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
                at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
                at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
                at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
                at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
                at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)


May be you can check from HDFS that is it Healthy?

Thanks & Regards
Rohith Sharma K S

From: Ashish Kumar Singh [mailto:ashish23aks@gmail.com<mailto:ashish23aks@gmail.com>]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Application Master waits a long time after Mapper/Reducers finish

Hello Users ,

I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master  waits for a long time after all the Mappers and
Reducers are completed before the job is completed .

This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers
complete in less than 10 minutes for the job .

Below are some observations:
a) Job completion status stands at 95% when the wait begins

b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO
[AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job
Transitioned from RUNNING to COMMITTING )

c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher
event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job
Transitioned from COMMITTING to SUCCEEDED )


Appreciate any help on this .

Thread dump while the Application master hangs is attached.
Regards,
Ashish


Mime
View raw message