hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
Date Tue, 14 Apr 2015 22:33:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495039#comment-14495039
] 

Zhijie Shen commented on YARN-3051:
-----------------------------------

bq. I believe in Timeline Service v.2 it is (cluster id, entity type, entity id) that uniquely
identify an entity.

Semantically, it matters whether we allow users to define the entity of the same identifier
<type, id> in different app or not. If we allow, for example, MR Job_1 can create an
entity <CURRENT_USER, zjshen> and MR Job_2 can create another entity <CURRENT_USER,
zjshen>. Otherwise, it's going to be a invalid use case to create entity <CURRENT_USER,
zjshen> in different apps.

This is some rule we need to explicitly tell users if they can do this entity naming more
not, though unlike the given example, as far as I can tell, the entity identifier is usually
unique enough not to be conflict with each other. And I guess due to this reason, <cluster
id, entity type, entity id> is usually sufficient to identify an entity. But I'm not sure
it is semantically useful. It means that I have Job_1 and Job_2 run on YARN_Cluster_1, and
Job_3 run on YARN_Cluster_2. Then, I can define the entities of the same identifier between
Job_1/2 and Job_3, but not between Job_1 and Job_2.

bq. The remaining attributes (user id, flow name, flow run id, app id) are part of the primary
key, and are required when a new entity is inserted. 

This may have some issue with the storage too. Since PK will include <user id, flow name,
flow run id, app id>, the following two example PKs are going to be valid:

* <cluster_1, user_1, flow_1, 1.0, 12345678, *app_1*, entity_type_1, entity_id_1>
* <cluster_1, user_1, flow_1, 1.0, 12345678, *app_2*, entity_type_1, entity_id_1>

However, if we look at <cluster Id, entity type, entity Id> only, these two entities
are going to be duplicated. Then, either we use <cluster Id, entity type, entity Id>
or <entity type, entity id> to get the entity, we are likely to get more than one entities.
Another problem is that due PK is defined in different schema, we can lookup the entity, but
scan through the whole table for it.


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be implemented
by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message