hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3041) [Data Model] create overall data objects of TS next gen
Date Wed, 18 Feb 2015 06:45:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325492#comment-14325492
] 

Vrushali C commented on YARN-3041:
----------------------------------



To add to my previous comment, this is the way I see it:

A flow is uniquely identified by cluster, user, queue, flow name and run id. So these are
metadata/attributes/class members of the flow class.  FlowRun is not a class, run id is an
attribute/member of a Flow class. An Application is a child of a Flow. There would also be
an AggregatedFlow class which has members like startTime and endTime of aggregation etc. Similarly,
user and queue are attributes of the Flow class. But AggregatedUser and AggregatedQueue are
classes, which have aggregated information for that user (or queue) over a time range.

Maybe I can give some examples of queries. 

For Flow: 
Example 1 : we query for “Give me all the runs of this flow that happened yesterday”,
Say the flow ran 10 times yesterday.  This should return a list of 10 flows, one flow object
for each run. Each flow object in turn has a list of Applications. 

Example 2 : we query for “ How much did this flow take up on the cluster yesterday? “
Say the flow ran 10 times yesterday. This query should return an aggregated flow object which
has the summation of all metrics from all the run of the flow yesterday.  This aggregatedFlow
now also has the startTime and endTime of aggregation as it’s members. (While we would allow
for custom time ranges, for efficiency we would want to aggregate daily, weekly etc.) 

For User 
Example 1: 
Query: give me all flows that this user ran over this time range. Returns a list of such flows,
one flow object for each individual run.

Example 2:
Query: give me how much this user consumed on the cluster during this time range. Would return
an AggregatedUser object which has startTime and endTime of this aggregation and summations
of metrics over that time range. Again, for aggregations, we would probably want to aggregate
daily, weekly etc while allowing for custom ranges. 


> [Data Model] create overall data objects of TS next gen
> -------------------------------------------------------
>
>                 Key: YARN-3041
>                 URL: https://issues.apache.org/jira/browse/YARN-3041
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, YARN-3041.3.patch,
YARN-3041.4.patch, YARN-3041.preliminary.001.patch
>
>
> Per design in YARN-2928, create the ATS entity and events API.
> Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow
run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message