hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suma Shivaprasad <sumasai.shivapra...@gmail.com>
Subject Re: QueueMetrics.AppsKilled/Failed metrics and failure reasons
Date Wed, 04 Feb 2015 05:32:57 GMT
Using hadoop 2.4.0. #of Applications running on average is small ~ 40 -60.
The metrics in Ganglia shows around around 10-30 apps killed every 5 mins
which is very high wrt to the apps running at any given time(40-60). The RM
logs though show 0 failed apps in audit logs during that hour.
The RM UI also doesnt show any apps in Applications->Failed tab . The logs
are getting rolled over at a slower rate ..every 1-2 hours. Am searching
for "Application Finished - Failed" to find the apps failed. Please let me
know if I am missing something here.

Thanks
Suma




On Wed, Feb 4, 2015 at 10:03 AM, Rohith Sharma K S <
rohithsharmaks@huawei.com> wrote:

>  Hi
>
>
>
> Could you give more information, which version of hadoop are you using?
>
>
>
> >> QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100.
> However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs.
>
> May be I suspect that Logs might be rolled out. Does more applications are
> running?
>
>
>
> All the applications history will be displayed  on RM web UI (provided RM
> is not restarted or RM recovery enabled). May be you can check these
> applications lists.
>
>
>
> For finding reasons for application killed/failed, one way is you can
> check in NodeManager logs also. Here  you need to check using container_id
> for corresponding application.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Suma Shivaprasad [mailto:sumasai.shivaprasad@gmail.com]
> *Sent:* 03 February 2015 21:35
> *To:* user@hadoop.apache.org; yarn-dev@hadoop.apache.org
> *Subject:* QueueMetrics.AppsKilled/Failed metrics and failure reasons
>
>
>
> Hello,
>
>
> Was trying to debug reasons for Killed/Failed apps and was checking for
> the applications that were killed/failed in RM logs - from RMAuditLogger.
>
>  QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100.
> However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs. Is it
> possible that some logs are missed by AuditLogger or is it the other way
> round and metrics are being reported higher ?
>
> Thanks
>
> Suma
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message