hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5378) Accomodate app-id->cluster mapping
Date Thu, 14 Jul 2016 08:37:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376574#comment-15376574
] 

Varun Saxena edited comment on YARN-5378 at 7/14/16 8:36 AM:
-------------------------------------------------------------

bq.  What if we eliminate the cluster from the row prefix and store the flowName, flowRunId,
and user as a column name prefix.
IIUC, the proposal is to include cluster ID in column name with flowName, flowRunId and user_id
(as prefix) followed by the cluster ID.
Something like below.
And then we can query app to flow mapping by fetching the appid row with only columns having
the required clusterid.
I think it makes sense to me. But we can have shorter names instead of flowname, flowRunId
and username as column prefixes I think.
{noformat}
 * |--------------------------------------|
 * |  Row       | Column Family           |
 * |  key       | info                    |
 * |--------------------------------------|
 * | AppId      | flowName!cluster1:      |
 * |            | foo@daily_hive_report   |
 * |            |                         |
 * |            | flowRunId!cluster1:     |
 * |            | 1452828720457           |
 * |            |                         |
 * |            | user_id!cluster1:       |
 * |            | admin                   |
 * |            |                         |
 * |            | flowName!cluster2:      |
 * |            | foo@daily_pig_report    |
 * |            |                         |
 * |            | flowRunId!cluster2:     |
 * |            | 1452813722341           |
 * |            |                         |
 * |            | user_id!cluster2:       |
 * |            | admin                   |
 * |            |                         |
 * |            |                         |
 * |            |                         |
 * |--------------------------------------|
{noformat}


was (Author: varun_saxena):
bq.  What if we eliminate the cluster from the row prefix and store the flowName, flowRunId,
and user as a column name prefix.
IIUC, the proposal is to include cluster ID as column name prefix with flowname, flowrunid
and user followed by the cluster ID.
Something like below.
And then we can query app to flow mapping by fetching the appid row with only columns having
the required clusterid.
I think it makes sense to me. But we can have shorter names instead of flowname, flowRunId
and username as column prefixes I think.
{noformat}
 * |--------------------------------------|
 * |  Row       | Column Family           |
 * |  key       | info                    |
 * |--------------------------------------|
 * | AppId      | flowName!cluster1:      |
 * |            | foo@daily_hive_report   |
 * |            |                         |
 * |            | flowRunId!cluster1:     |
 * |            | 1452828720457           |
 * |            |                         |
 * |            | user_id!cluster1:       |
 * |            | admin                   |
 * |            |                         |
 * |            | flowName!cluster2:      |
 * |            | foo@daily_pig_report    |
 * |            |                         |
 * |            | flowRunId!cluster2:     |
 * |            | 1452813722341           |
 * |            |                         |
 * |            | user_id!cluster2:       |
 * |            | admin                   |
 * |            |                         |
 * |            |                         |
 * |            |                         |
 * |--------------------------------------|
{noformat}

> Accomodate app-id->cluster mapping
> ----------------------------------
>
>                 Key: YARN-5378
>                 URL: https://issues.apache.org/jira/browse/YARN-5378
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Joep Rottinghuis
>
> In discussion with [~sjlee0], [~vrushalic], [~subru], and [~curino] a use-case came up
to be able to map from application-id to cluster-id in context of federation for Yarn.
> What happens is that a "random" cluster in the federation is asked to generate an app-id
and then potentially a different cluster can be the "home" cluster for the AM. Furthermore,
tasks can then run in yet other clusters.
> In order to be able to pull up the logical home cluster on which the application ran,
there needs to be a mapping from application-id to cluster-id. This mapping is available in
the federated Yarn case only during the active live of the application.
> A similar situation is common in our larger production environment. Somebody will complain
about a slow job, some failure or whatever. If we're lucky we have an application-id. When
we ask the user which cluster they ran on, they'll typically answer with the machine from
where they launched the job (many users are unaware of the underlying physical clusters).
This leaves us to spelunk through various RM ui's to find a matching epoch in the application
ID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message