hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-3819) Collect network usage on the node
Date Mon, 29 Jun 2015 16:59:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605879#comment-14605879
] 

Allen Wittenauer edited comment on YARN-3819 at 6/29/15 4:58 PM:
-----------------------------------------------------------------

Yes, I recognize that you are building on an already established framework that, prior, only
collected metrics that were specific to YARN.  But now with network and disk, the collection
details are such that all of the sub-projects could benefit.    It's shortsighted to build
something that could not very easily be used by all.

That said, the data collection code should be done in a generic way such that, in the future,
HDFS could plug into the same collection classes so that it too may make block scheduling
decisions. (This has been a discussion point around the HDFS community for a while).  YARN
could then call those methods that gather the data into its own framework to do whatever it
needs to do.

So while the framework is obviously different the actual work, of e.g. "how do I know the
IO stats on file system X", should be in common.  

It could be argued that the previous bits that are also being collected should be in common,
but that's already shipped.  Let's not repeat past mistakes though. 


was (Author: aw):
Yes, I recognize that you are building on an already established framework that, prior, only
collected metrics that were specific to YARN.  But now with network and disk, the collection
details are such that all of the sub-projects could benefit.    It's shortsighted to build
something that could very easily be used by all.

That said, the data collection code should be done in a generic way such that, in the future,
HDFS could plug into the same collection classes so that it too may make block scheduling
decisions. (This has been a discussion point around the HDFS community for a while).  YARN
could then call those methods that gather the data into its own framework to do whatever it
needs to do.

So while the framework is obviously different the actual work, of e.g. "how do I know the
IO stats on file system X", should be in common.  

It could be argued that the previous bits that are also being collected should be in common,
but that's already shipped.  Let's not repeat past mistakes though. 

> Collect network usage on the node
> ---------------------------------
>
>                 Key: YARN-3819
>                 URL: https://issues.apache.org/jira/browse/YARN-3819
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Robert Grandl
>            Assignee: Robert Grandl
>              Labels: yarn-common, yarn-util
>         Attachments: YARN-3819-1.patch, YARN-3819-2.patch, YARN-3819-3.patch, YARN-3819-4.patch,
YARN-3819-5.patch
>
>
> In this JIRA we propose to collect the network usage on a node. This JIRA is part of
a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message