hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3127) Avoid timeline events during RM recovery or restart
Date Tue, 26 May 2015 08:10:19 GMT

     [ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naganarasimha G R updated YARN-3127:
------------------------------------
    Description: 
1.Start RM with HA and ATS configured and run some yarn applications
2.Once applications are finished sucessfully start timeline server
3.Now failover HA form active to standby
4.Access timeline server URL <IP>:<PORT>/applicationhistory

//Note Earlier exception was thrown when accessed. 
Incomplete information is shown in the ATS web UI. i.e. attempt container and other information
is not displayed.

Also even if timeline server is started with RM, and on RM restart/ recovery ATS events for
the applications already existing in ATS are resent which is not required.


  was:
1.Start RM with HA and ATS configured and run some yarn applications
2.Once applications are finished sucessfully start timeline server
3.Now failover HA form active to standby
4.Access timeline server URL <IP>:<PORT>/applicationhistory

Result: Application history URL fails with below info


{quote}
2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the applications.
java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
	at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
	at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
	at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
	at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
	at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
	...
Caused by: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The entity
for application attempt appattempt_1422972608379_0001_000001 doesn't exist in the timeline
store
	at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151)
	at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499)
	at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108)
	at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84)
	at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	... 51 more
2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI:
/applicationhistory
org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: nestLevel=6 expected
5
	at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
	at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
{quote}

Behaviour with AHS with file based history store
	-Apphistory url is working 
	-No attempt entries are shown for each application.
	

Based on inital analysis when RM switches ,application attempts from state store  are not
replayed but only applications are.
So when /applicaitonhistory url is accessed it tries for all attempt id and fails


> Avoid timeline events during RM recovery or restart
> ---------------------------------------------------
>
>                 Key: YARN-3127
>                 URL: https://issues.apache.org/jira/browse/YARN-3127
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, timelineserver
>    Affects Versions: 2.6.0
>         Environment: RM HA with ATS
>            Reporter: Bibin A Chundatt
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL <IP>:<PORT>/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and other information
is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery ATS events
for the applications already existing in ATS are resent which is not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message