hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
Date Wed, 18 Mar 2015 22:42:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368064#comment-14368064
] 

Zhijie Shen commented on YARN-3040:
-----------------------------------

I've just uploaded a patch. It's an e2e modification to make the context information can be
passed from the client to the backend storage. The context information includes *clusterId*,
*userId*, *flowId*, *flowRunId* and *appId*. According to YARN-3240, new TimelineClient is
constructed per application, and in the context of one application, we can reasonably assume
this context information should be unchanged. Therefore, they just need to be specified when
the client is constructed. The context information should be gathered or passed to AM and
NM to construct timeline client  properly. For example, for AM, this information can be passed
via env inside CLC. Anyway, it's out of the scope of this Jira, we will cover that integration
once we make some particular framework AM to use new timeline client.

Back to the context information, some of them can be null, and some of them doesn't need to
be specified explicitly:

*  *clusterId*: The application should specify the a unique cluster ID, or by default the
cluster ID will be cluster_<start timestamp of RM>.
* *userId*: The user doesn't need to specify this information. Instead, it will be obtained
by the current ugi of the client.
* *flowId*: The user either pass in a flowID or if it is an orphan application, the flowId
will be the appId by replace the prefix with "flow".
* *flowRunId": If it is an orphan application, it's 0. The reason why it should be 0 instead
of a current timestamp when creating the timeline client is that their may have multiple clients
in AM and NMs to be constructed at different time. They need to be synced on the same flowRunId.
* *appId*: It's the only mandatory context information as we defined before. The client is
constructed to only work with one application.

I changed the web service endpoint accordingly to make it restful, and change the writer interface
accordingly to pass in the context information when putting the entity. In addition, I've
modified the FS-based writer implementation to reflect the change. The entity file will be
put in the dir {{root/entities/<clusterId>/<userId>/<flowId>/<flowRunId>/<appId>/<entityType>/<entityId>.thist}}.
It has been verified by TestDistributedShell and TestFileSystemTimelineWriterImpl.


> [Data Model] Implement client-side API for handling flows
> ---------------------------------------------------------
>
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message