ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <jhur...@hortonworks.com>
Subject Re: Ambari alerts with YARN HA alarming with status "unknown"
Date Mon, 31 Aug 2015 15:07:54 GMT
This is caused by how YARN does HA mode. With two YARN RMs, the standby RM returns a 200 response
with a JavaScript redirect instead of an 3xx redirection. When not using Kerberos, Ambari
should be able to parse the headers and follow the JS-based redirect. However, on a Kerberized
cluster, we use curl which cannot do this. Therefore, requests against the secondary RM will
return an UNKNOWN response since it did get a 200. I think a few things can be improved here:

1) There should be a ticket filed for YARN to have their HA mode use a proper redirect
2) Ambari might not want to produce an UNKNOWN response here since it gives a false feeling
that something went wrong.

> On Aug 31, 2015, at 9:43 AM, Andrew Robertson <andyrobertson101@gmail.com> wrote:
> 
> I recently upgraded to Ambari 2.1.1 from 2.0, and am seeing an
> unexpected ambari alert in my YARN HA'd cluster.
> 
> On my currently active YARN NodeManager and ResourceManager, Ambari
> alerts are fine.
> 
> On the secondary YARN NodeManager and ResourceManager, Ambari reports
> "Status: Unknown" / "HTTP 200 response (metrics unavailable)".  This
> is for the alerts:
> - NodeManager Health Summary
> - ResourceManager CPU Utilization
> - ResourceManager RPC Latency
> 
> The Ambari web interface does not make this error obvious, as it says
> "0 alerts" in the top bar. But you can see the alerts with "unknown"
> status when you go to the ambari alerts page, or if you query the
> alerts API. We have our overall alert management system querying the
> API, and it's treating the "unknowns" as an error that needs to be
> resolved.
> 
> A network dump of the ambari poll against the secondary RM looks like:
> 
> Request:
> """
> GET /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo HTTP/1.1
> ...
> """
> 
> Response:
> """
> HTTP/1.1 200 OK
> ...
> Refresh: 3; url=http://{my-primary-rm}:8088/jmx
> Content-Length: 106
> Server: Jetty(6.1.26.hwx)
> 
> This is standby RM. Redirecting to the current active RM:
> http://{my-primary-rm}:8088/jmx
> """
> 


Mime
View raw message