hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4063) Populate the flow activity table
Date Wed, 19 Aug 2015 18:41:45 GMT

    [ https://issues.apache.org/jira/browse/YARN-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703541#comment-14703541

Vrushali C commented on YARN-4063:

Current line of thinking
- on application created and application finished, the start time and end time of the flow
can be updated for that run id for that day.

- we can use coprocessors here. At compaction time, if it's towards end of the day and the
flow record does not have an end time for this run id, we can add in a snapshot time to indicate
that the flow is still running

- the coprocessor can also read and send back the min start time and max end time for that
flow run id (similar to what is being done in the flow_run table).

> Populate the flow activity table
> --------------------------------
>                 Key: YARN-4063
>                 URL: https://issues.apache.org/jira/browse/YARN-4063
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
> Need to populate the flow_activity table
> -Stores per day flow run pointers and info
> -Written to by RM’s collector for application lifecycle
> primary key: cluster ! day timestamp ! user ! flow id 
> -For the day timestamp we can take the millis since epoch for the end of the day (24:00h).
> columns include runids, start time, end time, snapshot time
> -This table will also be used to efficiently retrieve the flows that had an activity
in a certain day. That is needed for daily aggregations, but also for several UIs, including
a flow-based UI.

This message was sent by Atlassian JIRA

View raw message