hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
Date Wed, 26 Aug 2015 23:49:45 GMT

    [ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715765#comment-14715765

Sangjin Lee commented on YARN-4074:

I am about 90% done with the POC patch for this. I'm shooting for some time tomorrow to be
able to post the patch.

In the meantime, in order to enable [~varun_saxena] and others to make progress, the following
is the proposal that I'm implementing. Please *do* let me know if you have any questions or
issues with the proposal so we can adjust accordingly.

In order to support the POC UI, we will implement 2 new queries:
# given the cluster, return the N most recent flows from the flow activity table
# given the cluster, user, flow id, and flow run id, return the flow run (with metrics) from
the flow run table

At the REST level, they can be represented as follows for example:
# /listFlows/clusterId?limit=100
# /flow/clusterId/userId/flowName/flowRun

With these URLs, the UI can invoke the first URL to render the landing page with the table.
The REST output contains the flow activity records along with all the flow runs that were
active during the day.

If the user drills down on a single flow, then the client side can generate the second queries
against all the flow runs for that flow to fetch the metrics at the flow run level.

If the user further drills down into a single flow run, then it can do a (existing) query
to retrieve all applications for a given flow run to get the application entities.

(reader interface)
Currently I am *not* planning to add new flow-specific methods to the {{TimelineReader}} interface.
Instead, you can use the existing {{getEntities()}} and {{getEntity()}} methods to perform
the above new queries:
# {{getEntities()}} with cluster specified and entity type = YARN_FLOW_ACTIVITY (a new timeline
entity type)
# {{getEntity()}} with cluster, user, flow id, flow run id specified and entity type = YARN_FLOW

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as implementation
of the API.

This message was sent by Atlassian JIRA

View raw message