hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover
Date Wed, 02 Dec 2015 18:18:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036299#comment-15036299
] 

Xuan Gong commented on YARN-4392:
---------------------------------

Thanks for the comments, [~Naganarasimha]
bq. So actually in the patch i had followed the approach such that for finish events i had
sent synchronous push in the ATS side, in this way we are sure that AppFinish event is sent
out before we store the state of the app in the RM state store. But yes this approach looks
little shaky but thought it might solve the issue.

Let us *not synchronously* send the ATS event. Otherwise, it would depend on the ATS. 
It is always good to make sure that we can send the ATS event "exactly once", but this would
make things complicate, such as send ats events synchronously. This would add the additional
but not necessary dependency.  
Currently, we are using "at least once" approach. Since all the information are the same if
they are the duplicate events (after applying the patch), I think that is fine. 

What is your opinion??

> ApplicationCreatedEvent event time resets after RM restart/failover
> -------------------------------------------------------------------
>
>                 Key: YARN-4392
>                 URL: https://issues.apache.org/jira/browse/YARN-4392
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Xuan Gong
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, YARN-4392.2.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time
1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437454008244
is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444305171
is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444293115
is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444379645
is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444361234
is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444342029
is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444323447
is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444430006
is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444415698
is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444419060
is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444393931
is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message