hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5095) flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled
Date Fri, 20 May 2016 23:21:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294513#comment-15294513
] 

Varun Saxena edited comment on YARN-5095 at 5/20/16 11:20 PM:
--------------------------------------------------------------

Had a look at code for this.
We start timeline collector right after creating RMAppImpl object in RMAppManager#createAndPopulateNewRMApp.
During start of timeline collector, after collector has been added to RMTimelineCollectorManager,
we call postPut.
This is where flow run ID is set with application start time.
Now while recovering application, in RMAppImpl constructor, we initialize start time with
current time. But the start time from state store is only updated when RECOVER event is handled.
And that is done after timeline collector has been started and postPut has been called.
That is why current system time is sent in flow run ID.

We hence have 2 options to fix this. Take start time from state store and pass that as well
in RMAppImpl constructor and set it.
Or set flow run ID equal to app submit time which is already set in RMAppImpl constructor.
I think we can go with latter.

Thoughts ?
cc [~sjlee0]


was (Author: varun_saxena):
Had a look at code for this.
We start timeline collector right after creating RMAppImpl object in RMAppManager#createAndPopulateNewRMApp.
During start of timeline collector after collector has been added to RMTimelineCollectorManager,
we call postPut.
This is where flow run ID is set with application start time.
Now while recovering application, in RMAppImpl constructor, we initialize start time with
current time. But the start time from state store is only updated when RECOVER event is handled.
And that is done after timeline collector has been started and postPut has been called.
That is why current system time is sent in flow run ID.

We hence have 2 options to fix this. Take start time from state store and pass that as well
in RMAppImpl constructor and set it.
Or set flow run ID equal to app submit time which is already set in RMAppImpl constructor.
I think we can go with latter.

Thoughts ?
cc [~sjlee0]

> flow activities and flow runs are populated with wrong timestamp when RM restarts w/
recovery enabled
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5095
>                 URL: https://issues.apache.org/jira/browse/YARN-5095
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Critical
>              Labels: yarn-2928-1st-milestone
>
> I have the RM recovery enabled. I see that upon restart the RM populates records into
flow activity and flow runs but with *wrong* timestamps. What I mean by the timestamp is the
part of the row key:
> - flow activity: row created with the day of the RM restart
> - flow run: row created with the RM start time as the "run id"
> The following illustrates an example flow run:
> {noformat}
> metrics: [ ],
> events: [ ],
> id: "sjlee@Sleep job/1463433569917",
> type: "YARN_FLOW_RUN",
> createdtime: 1463422860987,
> info: {
> UID: "yarn_cluster!sjlee!Sleep job!1463433569917",
> SYSTEM_INFO_FLOW_RUN_ID: 1463433569917,
> SYSTEM_INFO_FLOW_NAME: "Sleep job",
> SYSTEM_INFO_FLOW_RUN_END_TIME: 1463422865033,
> SYSTEM_INFO_USER: "sjlee"
> },
> isrelatedto: { },
> relatesto: { }
> {noformat}
> The created time and the end time are correct (i.e. original time), whereas the timestamp
in the row key (= run id: 1463433569917) is actually later than the end time and coincides
with the RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message