hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1024) Define a virtual core unambigiously
Date Mon, 12 Aug 2013 14:54:47 GMT

    [ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736919#comment-13736919
] 

Robert Joseph Evans commented on YARN-1024:
-------------------------------------------

Perhaps I am missing something here.  The goals Arun has asked for are simplicity, predictability,
and consistency.  Simplicity I totally agree with, but I do not totally agree with always
having predictability and consistency after simplicity, and I do not agree that they are always
required.  These two come with a trade-off with utilization, and this is something that Sandy
brought up, although not directly.  For HBase guaranteed resources, in terms of both parallelism
and raw CPU speed are important because it is using those to provide a service where predictability
and consistency are needed. If the HBase AM cannot truly express to YARN what it needs because
of simplicity HBase on YARN will not be used, because it will not behave the way users need/expect
it to.  Similarly if HBase is allowed to steal resources from others you can easily request
too little resources on an underutilized cluster and when the cluster is under load it falls
apart.

This is similar for me with my desire for Storm on YARN.  I am happy to use a complex API
to express my needs if it means that I get what I need.  On the other hand, if I am doing
MR batch processing most of the time (but not all of it) I am doing single threaded processing
and I really just want it to fill in the gaps and use as much unused CPU as it can.  Yes,
some MR jobs have strict SLAs but most do not and it is best if we can provide a scheduler
that can balance both.

I also don't agree that because YARN lacks the ability to schedule everything that impacts
performance, including network and disk IO, that we should skip doing CPU correctly.  Some
applications are truly CPU bound and they will benefit.  For other resources we can add them
to YARN as they are needed until we do meet the goal of predictability and consistency.
                
> Define a virtual core unambigiously
> -----------------------------------
>
>                 Key: YARN-1024
>                 URL: https://issues.apache.org/jira/browse/YARN-1024
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that it's easy
to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU
capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message