hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6733) Add table for storing sub-application entities
Date Fri, 23 Jun 2017 17:53:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061301#comment-16061301

Vrushali C commented on YARN-6733:

Hi [~haibo.chen],
Good questions. Let me try to answer them and give more context.

bq. 1) Is the sub_application going to have similar schema as ApplicationTable? 

Not quite. It will have column families & columns similar to the application and entity
tables. But the row key we, were discussing.  We were thinking of two possibilities:
1) cluster ! subapp_user ! subapp ! entity type ! id prefix ! entity id
2) cluster  ! entity type ! subapp_user ! subappid ! prefix ! entity id

The first kind of row key is for helping serve different types of entities for specific users.
Like give me all DAGs for this user. 
The second kind row key is for helping the landing pages of UIs like Tez, which want all DAGs
for all sub-app users.

I have been thinking over this. I think the basic landing should not be answered by this table.
Just like flow activity helps answer landing pages for flows, there could be another table
for landing pages. The reason is, I anticipate large number of queries run by users in these
Tez sessions. Then the table will be pretty big and in no way can we serve latest run dags
easily.  The landing page for all sub-app users would come from somewhere else.

bq. 2) Are nodes in a Tez DAG all YARN applications? If so, can't Tez write to ApplicationTable
with the doAs user?
I think the Tez entities will be DAGs, Vertices etc. We plan to call the "user" in such cases,
a subapp_user not a doAs user. 
Each Tez DAG would belong to an app master (hence some app id) but that would belong to the
Tez session (which is a flow for ATS). As we know, the app master/app id is not known to Tez
sub app users, so sub app info belongs in another table. The application related information
still exists for that Tez session and goes to the Application table with the user being the
one who is running the Tez session/app masters etc. The doAs information is no longer available
to Yarn once the app master starts running. 

In other news, I am working on the patch, hope to upload one shortly. 

> Add table for storing sub-application entities
> ----------------------------------------------
>                 Key: YARN-6733
>                 URL: https://issues.apache.org/jira/browse/YARN-6733
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
> After a discussion with Tez folks, we have been thinking over introducing a table to
store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a few AMs.
These AMs accept DAGs from other users. Tez will execute these dags with a doAs user. ATSv2
should store this information in a new table perhaps called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name fields to include
sub-application user etc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message