hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3127) Avoid timeline events during RM recovery or restart
Date Fri, 24 Jul 2015 18:07:06 GMT

     [ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Naganarasimha G R updated YARN-3127:
    Attachment: YARN-3127.20150624-1.patch

Hi [~xgong],
I have modified the patch to work for the scenario you mentioned but in best effort basis
it will try to avoid duplicated publish, such that events are published b4 saving it to statestore
(failover happens after publishing and b4 saving to state store might result in multiple events
Based on state transition diagram, All the events are going through the final_saving state
except for 
New -> Finished   (on RECOVER event) 
New -> Failed     (on RECOVER event)
New -> Killed     (on KILL,RECOVER event)  
Killing -> Finished  (on ATTEMPT_FINSHED event)
running -> Finished  (on ATTEMPT_FINSHED event)

first 2, No need to handle as the state would be published ATS b4 recovery.
for the 3rd one when Application is killed from New state then we need to explicitly publish
and also the last 2 state transitions needs to be handled which doesn't go through final_saving
Please review...

> Avoid timeline events during RM recovery or restart
> ---------------------------------------------------
>                 Key: YARN-3127
>                 URL: https://issues.apache.org/jira/browse/YARN-3127
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, timelineserver
>    Affects Versions: 2.6.0
>         Environment: RM HA with ATS
>            Reporter: Bibin A Chundatt
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: AppTransition.png, YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch,
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL <IP>:<PORT>/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and other information
is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery ATS events
for the applications already existing in ATS are resent which is not required.

This message was sent by Atlassian JIRA

View raw message