hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohith Sharma K S <rohithsharm...@huawei.com>
Subject RE: QueueMetrics.AppsKilled/Failed metrics and failure reasons
Date Wed, 04 Feb 2015 04:33:36 GMT
Hi

Could you give more information, which version of hadoop are you using?


>> QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. However RMAuditLogger
shows 1 or 2 Apps as Killed/Failed in the logs.
May be I suspect that Logs might be rolled out. Does more applications are running?

All the applications history will be displayed  on RM web UI (provided RM is not restarted
or RM recovery enabled). May be you can check these applications lists.

For finding reasons for application killed/failed, one way is you can check in NodeManager
logs also. Here  you need to check using container_id for corresponding application.

Thanks & Regards
Rohith Sharma K S

From: Suma Shivaprasad [mailto:sumasai.shivaprasad@gmail.com]
Sent: 03 February 2015 21:35
To: user@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: QueueMetrics.AppsKilled/Failed metrics and failure reasons

Hello,

Was trying to debug reasons for Killed/Failed apps and was checking for the applications that
were killed/failed in RM logs - from RMAuditLogger.
QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. However RMAuditLogger
shows 1 or 2 Apps as Killed/Failed in the logs. Is it possible that some logs are missed by
AuditLogger or is it the other way round and metrics are being reported higher ?
Thanks
Suma
Mime
View raw message