hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krish Donald <gotomyp...@gmail.com>
Subject Re: How to troubleshoot failed or stuck jobs
Date Mon, 02 Mar 2015 06:17:48 GMT
Thanks Rohith ...

What are the other issue you have seen for failed or stuck jobs?

On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> 1.       For the Failed jobs, you can directly check the MRAppMaster
> logs.  There you get reason for failed jobs.
>
> 2.       For the stuck job, you need to do some ground work to identify
> what is going wrong. It can be either YARN issue or MapReduce issue.
>
> 2.1   In a recent time, I have face job stuck many times if headroom
> calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM
> uses this as deciding factors (
> https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding
> parent jira is  https://issues.apache.org/jira/i#browse/YARN-1198
>
> 2.2   When the job is stuck,
>
> YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total
> Memory, How many NodeManagers? What is the headroom sent to AM.
>
>                  MapReduce – Any NM’s are blacklisted, Does all the
> reducers tasks are using ClusterMemory? By default Reducers start before
> Mapper completion. In case if Mapper fails because of some unstable node,
> then reducers take over the cluster. Here, it is expected reducers should
> be pre-empted. Need to identify whether reducers are getting pre-empted.
>
> MRAppMaster log would help for some extent to analyze the issue.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Krish Donald [mailto:gotomypc27@gmail.com]
> *Sent:* 02 March 2015 11:09
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to troubleshoot failed or stuck jobs
>
>
>
> Thanks for Link Ted,
>
>
>
> However wanted to understand the approach which should be taken when
> troubleshooting failed or stuck jobs ?
>
>
>
>
>
> On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> Here are some related discussions and JIRA:
>
>
>
> http://search-hadoop.com/m/LgpTk2gxrGx
>
> http://search-hadoop.com/m/LgpTk2YLArE
>
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6190
>
>
>
> Cheers
>
>
>
> On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <gotomypc27@gmail.com> wrote:
>
> Hi,
>
>
>
> Wanted to understand,  How to troubleshoot failed or stuck jobs ?
>
>
>
> Thanks
>
> Krish
>
>
>
>
>

Mime
View raw message