hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashwin Shankar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage
Date Thu, 09 Jun 2016 17:48:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322969#comment-15322969

Ashwin Shankar commented on YARN-4767:

hey [~templedf],
Thanks much for rebasing the patch! Just to give you some context on what we see at my company
- we first got complaints from users that they cannot access the AM UI.Since these http requests
go through the Web proxy, we looked at that process and found that it was unresponsive since
all its threads were busy.When we listed open file descriptors, we saw that the webproxy had
many connections from itself to itself, which seemed weird then, but makes sense now since
its due to AM redirecting requests back to proxy. Web proxy logs showed that  most of the
requests were made to one or two specific apps. We then looked at that app's AM logs and found
"UnresolvedHostException" when AM was trying to resolve proxy host(which is basically the
master node where RM runs) in AmIpFilter code. We believe it wasn't able to resolve due to
an intermittent network event to the DNS but its not conclusive. Overall, this issue has been
occurring pretty much once every week and we have to bounce the webproxy to fix it. 

Thanks [~kasha] for review! [~xgong], [~vinodkv] please take a look at the patch when you
get a chance. we would like to backport it as soon as its committed.

> Network issues can cause persistent RM UI outage
> ------------------------------------------------
>                 Key: YARN-4767
>                 URL: https://issues.apache.org/jira/browse/YARN-4767
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 2.7.2
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4767.001.patch, YARN-4767.002.patch, YARN-4767.003.patch, YARN-4767.004.patch,
YARN-4767.005.patch, YARN-4767.006.patch, YARN-4767.007.patch
> If a network issue causes an AM web app to resolve the RM proxy's address to something
other than what's listed in the allowed proxies list, the AmIpFilter will 302 redirect the
RM proxy's request back to the RM proxy.  The RM proxy will then consume all available handler
threads connecting to itself over and over, resulting in an outage of the web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message