From "Kendall Thrapp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-462) Project Parameter for Chargeback
Date Wed, 13 Mar 2013 15:38:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601245#comment-13601245

Kendall Thrapp commented on YARN-462:

Thanks for the questions and feedback.  Yes, first I should clarify what I intended by chargeback.
 I'm looking to be able quantify cluster resource usage (memory, CPU, HDFS, etc.) for every
application, and then roll that up to the project level.  This would allow us to accurately
charge the customer (i.e. team/project) for their grid usage (either literally or just informatively).
 I want to provide incentive for more efficient coding, as well as make it easier for teams
to compare their resource usage across different software versions of their Hadoop applications,
config parameter changes, etc.

I had originally hoped that hierarchical queues could serve this purpose as well, but have
since run into several issues with this approach.  The first is that it doesn't scale for
clusters with large numbers of projects.  I've seen large clusters shared between over a hundred
different projects, each with their own teams of users.  If I recall correctly, queues can't
be assigned less than 1% of the total capacity, so it wouldn't be possible to give each of
these project their own queue.  Even if we could, I suspect this could result in too much
overhead for the scheduler and too much fragmentation of the cluster resources, which could
result in poorer overall utilization.

The second issue is that the project-per-queue approach conflicts with how I see users wanting
to use our queues.  In many cases I see queues being used to distinguish application priorities,
ensuring that high priority time-sensitive jobs get the resources they need to finish on time,
while big but lower priority and less time-sensitive jobs are constrained by being in a smaller
queue.  I'd expect a lot of pushback from our users for any chargeback-focused queue configuration
that had a negative impact on job run times and meeting SLAs.  The idea of the project/chargeback
parameter decouples the two.
> Project Parameter for Chargeback
> --------------------------------
>                 Key: YARN-462
>                 URL: https://issues.apache.org/jira/browse/YARN-462
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need to be able
to associate applications with "projects", e.g. "pipeline X", "property Y".  This would allow
us to aggregate on this property, thereby helping us compute grid resource usage for the entire
"project".  Currently, for a given application, two things we know about it are the user that
submitted it and the queue it was submitted to.  Below, I'll explain why neither of these
is adequate for enterprise-level chargeback and understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one of our real
users submits an application on a Hadoop grid, they're presumably not usually doing it for
themselves.  They're doing work for some project or team effort, so its that team or project
that should be "charged" for all its users applications.  Maintaining outside lists of associations
between users and projects is error-prone because it is time-sensitive and requires continued
ongoing maintenance.  New users join organizations, users leave and users even change projects.
 Furthermore, users may split their time between multiple projects, making it ambiguous as
to which of a user's projects a given application should be charged.  Also, there can be headless
users, which can be even more difficult to link to a project and can be shared between teams
or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to also mean
who should be "charged" for an application can have a detrimental effect on the primary purpose
of queues.  It could be manageable in the case of a very small number of projects sharing
a cluster, but doesn't scale to tens or hundreds of projects sharing a cluster.  If a given
cluster is shared between 50 projects, creating 50 separate queues will result in inefficient
use of the cluster resources.  Furthermore, a given project may desire more than one queue
for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue who to "charge"
for a given application, I propose a straightforward approach where that information be explicitly
supplied when the application is submitted, just like we do with queues.  Let's use a charge
card analogy: when you buy something online, you don't just say who you are and how to ship
it, you also specify how you're paying for it.  Similarly, when submitting an application
in YARN, you could explicitly specify to whom it's resource usage should be associated (a
project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that organizations
not interested in chargeback or project-level resource tracking can happily continue on as
if it wasn't there.  However, it should be configurable at the cluster-level such that, a
given cluster to could elect to make it required, so that all applications would have an associated
project.  The value of this new parameter should be exposed via the Resource Manager UI and
Resource Manager REST API, so that users and tools can make use of it for chargeback, utilization
metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in the ways
it could be used.  It is essentially just an additional party other than user or queue that
an application can be associated with, so its use is not just limited to a chargeback scenario.
 For example, an organization not interested in chargeback could still use this parameter
to communicate useful information about a application (e.g. pipelineX.stageN) and aggregate
like applications.
> Enforcement
> Couldn't users just specify this information as a prefix for their job names?  Yes, but
the missing piece this could provides is enforcement.  Ideally, I'd like this parameter to
work very much like how the queues work.  Like already exists with queues, it'd be ideal if
a given user couldn't just specify any old value for this parameter.  It could be configurable
such that a given user only has permission to submit applications for specific "projects".
 Submitting an application with this parameter being anything other than what the given user
is allowed, would cause the application to be rejected in the same manner as if the user has
specified an invalid queue.
> Again, so as to have no effect on organizations not interested in this feature, this
enforcement should be off by default, but configurable at the cluster level such that it could
be turned on for clusters wanting to use it.

