ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-9894) Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
Date Tue, 03 Mar 2015 18:08:04 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-9894:
------------------------------------
    Attachment: AMBARI-9894.patch

> Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
> ---------------------------------------------------------
>
>                 Key: AMBARI-9894
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9894
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Jonathan Hurley
>            Priority: Critical
>         Attachments: AMBARI-9894.patch
>
>
> 3-node cluster
> Configured ResourceManager HA. Three alerts are now Unknown:
> - ResourceManager RPC Latency. Has two instances as expected but each is unknown "No
JSON object could be decoded".
> - NodeManger Health Summary. Has two instances as expected but each is unknown "No JSON
object could be decoded".
> - ResourceManager CPU Utiliz. Has two instances as expected but each is unknown "No JSON
object could be decoded".
> Both RMs are running and I can quick llink over to RMUI + JMX.
> The reason this fails is because YARN forwards requests for the standby RM to the active
one. In this scenario, the alert gets back an HTTP 200 response that looks like:
> {noformat}
> This is standby RM. Redirecting to the current active RM: http://c6403.ambari.apache.org:8088/
> {noformat}
> Unfortunately, this is a refresh header redirect which is not able to be handled by the
metric alert. The reason that the alerts work is that after the VMs restarted, the original
RM became active again. 
> There are a few issues here:
> - YARN doesn't do HA in the same way that other services like HDFS do. As a result, there's
no config property that could let the alert know what to do or which hosts to contact.
> - YARN actually forwards after an HTTP 200 to the active node, which doesn't jive with
how alerts works.
> This is a definite problem and requires some further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message