hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
Date Fri, 20 Mar 2015 23:48:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Zhijie Shen updated YARN-3040:
    Attachment: YARN-3040.2.patch

The new patch changes the way that to pass in the context information to the aggregator. Again
it's based on the assumption that the context won't change during the lifecycle of the app.
Therefore, we don't need to specify the context info for every put-entity request, but set
it to the timeline collector when is starts. The backend and the context information to keep
is not change altered in the new patch.

In the new data flow of context information, clusterId is obtained from the configuration,
appId is obtained when constructing the timeline collector. User and flow and flow run info
will be passed to the collector at the starting stage via collector<->NM RPC interface.
Among the three, user info is already available in NM, flow and flow run need to be provided
by the user when submitting the application via the tag field. This info will be passed to
NM when starting the AM container via the env of CLC. The collector will issue the query to
NM to ask for this info.

The distributed shell has been updated to show how the client can pass flow and flow run info
into the application. Test cases has been modified and added to verify: 1 the newly added
RPC call works, 2 the context info works e2e.

To answer Sangjin' s questions:

bq. How can individual frameworks (MR, tez, ...) set these attributes and pass them to the
RM at the time of the application launch? How does that information get passed to the TimelineClient
and to the timeline collector?

The the description of the context information data flow before. And take a look at DS app
for reference.

bq. It sounds like each NM will need to have multiple timeline clients (one for each application).

That's correct.

bq. The RM will have its own collector, and it does not go through the TimelineClient API.
How would that work?

RM will have all the above context info. When constructing and starting RM collector, we should
make sure it be setup.

bq. flowId should be flowName (that's the standard terminology we're using)

Personally, I prefer to user ID to be uniform among the all the context properties. ID indicates
it can be used to identify a flow.

bq. flow version seems to be missing from this; while flow version is not part of the primary
key of the entity, it is a necessary attribute
bq. I think flow run id can (and should) be a long; it doesn't have to be a generic string

I thought version is part of flow id. I think we can revisit it once the schema is done, and
we finalized the *generic* description about the flow structure and the notation. So far I'd
like to keep it as what it is now. Thoughts?

bq. the default cluster id should be just the cluster name; I'm not sure why we need to add
the cluster start timestamp; 

It makes sense, but when RM restarts we use the new start time of RM to identify the app instead
of the one before. In current way, cluster_xyz will contain the application_xyz_123. This
was my rationale before. And this default cluster id construction is only used in the case
the user didn't specify the cluster id in config file. In production, user should specify
one. I'll thought about the question again.

bq. hopefully isUnitTest can be removed with the changes I made in the previous commit

Right. It's not necessary.

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

View raw message