hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: MapReduce Jobs being 'stuck' for several hours and then completing
Date Thu, 28 Apr 2011 15:44:29 GMT
Hi Abhinay, 

If you have access to the compute nodes, then

1) jstack of streaming mapper jvm
2) strace -f of streaming mapper jvm
3) strace -f of streaming map process itself

might help.

Koji


On 4/28/11 3:33 AM, "Abhinay Mehta" <abhinay.mehta@gmail.com> wrote:

> Hi all,
> 
> We are using CDH3B4 on the Hadoop Cluster.
> 
> We have hourly jobs kicking off every hour using the streaming API,
> each one of these jobs used to take 4/5 mins to complete but since 1pm
> yesterday all of a sudden started taking 3/4 hours.
> 
> We looked at the data the jobs are working on and the data is exactly the
> same as it always has been.
> The cluster / config has not been touched since the upgrade to CDH3B4 which
> was one month ago.
> 
> No errors are being reported in any of the logs, the jobs are just taking
> longer, much longer.
> One thing I have noticed in the logs, when the jobs just sit there in the
> middle of a job I do see one consistent entry in the slave log files:
> 
> 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
> 
> I see that entry in Map phases and Reduce phases, when the jobs just sit
> idle for many tens of mins not doing anything.
> This happens even if there is nothing else running on the cluster.
> 
> If anyone can shed some light on this or give me a direction to look into
> further then it would be much appreciated.
> 
> Thank you.
> 
> Regards,
> Abhinay Mehta


Mime
View raw message