hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
Date Mon, 11 Nov 2013 17:47:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819150#comment-13819150
] 

Zhijie Shen commented on YARN-1390:
-----------------------------------

What I was originally proposing is to upgrade single *applicationType* to multiple *tags*.
Actually, the current single applicationType can be considered as a tag: according to the
name of this field, users are supposed to fill this field with an application type. However,
we actually has no restriction on what the application type should be. Users are free to come
up some words based on their understanding and requirements. For example, in the case that
[~kkambatl] and [~rkanter] have mentioned, the ultimate program that submits the application
will be used to identify the application type. On the other hand, users may want to classify
applications according to the computation framework, such as mapreduce and tez. MAPREDUCE-5618
may immediately solve the problem that [~kkambatl] and [~rkanter] are encountering, but if
the applicationType field is set to source, we can no longer search the applications according
to their computation frameworks. To sum up, the single applicationType allow users to describe
the applications only in one aspect.

In contrast, if we allows multiple tags to describe an application, users can annotate the
application with both the source (e.g., pig) and the computation framework (e.g., mapreduce),
and even other kind of information, such as "long-running application", and the tenant name.
It will be pretty much like the tag system of online photos/videos/music, which allows users
to describe the object with their own words. Otherwise, it is not efficient to add dedicate
field (e.g., applicationSource) every time we come up with a new aspect to describe an application.

I'm not sure multiple tags is way we want to solve this issue, and I file another jira (YARN-1399)
to trace multiple tags for an application. However, if we'd like to have an dedicate field
for each aspect to describe the application, IMOH, it is good to restrict the word we can
supply. For example, applicationType must be the name of a computation framework, and be chosen
among mapreduce, tez, storm, and etc. Otherwise, we may expect a chaotic application type
list: mapreduce, pig, hive, tez. And it should be similar for applicationSource. In conclusion,
the dedicate field is better to behave like a category with predefined enumerated values.



> Provide a way to capture source of an application to be queried through REST or Java
Client APIs
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1390
>                 URL: https://issues.apache.org/jira/browse/YARN-1390
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is useful to
have an applicationSource field to track the source of an application. The application source
can be useful in (1) fetching only those applications a user is interested in, (2) potentially
adding source-specific optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message