hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anubhav Dhoot <adh...@cloudera.com>
Subject Re: Calling a merge vote for YARN-1051
Date Fri, 03 Oct 2014 23:13:34 GMT
+1 Non binding.
I misread the deadline so mine may not count. This adds a relevant and
important dimension to the scheduling features in YARN.

On Fri, Oct 3, 2014 at 2:23 PM, Carlo Curino <ccurino@microsoft.com> wrote:

> Thanks everyone for voting, if I count right we have:
>  4 +1 binding,
>  5 +1 non-binding (including ourselves)
>
> So we are proceeding with merge to trunk (via Chris Douglas),
> and per Vinod's and Karthik's suggestions, we will get a couple
> of clean builds / jenkins runs, and repeat our usual suite of
> runs on clusters and then commit to branch-2 and branch-2.6.
>
> Thanks,
> Carlo & Subru
>
> On 10/2/14 4:17 PM, "Karthik Kambatla" <kasha@cloudera.com> wrote:
>
> >If this vote is meant for all branches:
> >
> >+1 to merge to trunk
> >+1 to merge to branch-2
> >+1 to merge to branch-2.6, provided we "label" this feature
> >experimental/alpha until the follow-up items are addressed.
> >-0 to unconditional merge to branch-2.6.
> >
> >PS: We should decide on the way to communicate the stability of a feature.
> >May be, the new-feature notes in the release documentation should have
> >this
> >label?
> >
> >
> >
> >On Wed, Oct 1, 2014 at 6:23 PM, Karthik Kambatla <kasha@cloudera.com>
> >wrote:
> >
> >> +1. Nicely done, Subru and Carlo.
> >>
> >> I have been partially involved with the work, and have reviewed some of
> >> the patches. With some help from Subru and documentation from Carlo
> >> (thanks!), I was able to play with the reservation system. Verified the
> >> following:
> >> 1. Reservations can be made only for the amount of resources available
> >>for
> >> that queue.
> >> 2. Jobs submitted against a reservation run in the corresponding
> >> "reservation" queue, and jobs submitted to the same higher-level queue
> >>but
> >> not against a reservation run in the corresponding "default" queue.
> >> 3. The web-ui shows the reserved resources in a queue even when there
> >>are
> >> no apps running.
> >>
> >> There are a few follow-up items towards feature completeness, and I am
> >> okay with working on them post merge to trunk as planned.
> >> 1. Support for FairScheduler
> >> 2. Recover reservations on RM restart/failover
> >> 3. CLI and/or REST APIs to make reservations - this is very useful for
> >> testing
> >> 4. Documentation in the usual apt.vm format.
> >>
> >> Cheers!
> >> Karthik
> >>
> >>
> >>
> >>
> >> On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wheeleast@gmail.com> wrote:
> >>
> >>> +1 (non-binding),
> >>> Reviewed several patches related to scheduler side changes. As Jian
> >>> mentioned, this will not affect existing behavior.
> >>> Looking forward this feature will be used by more people. Thanks for
> >>>Carlo
> >>> and Subru!
> >>>
> >>> Thanks,
> >>> Wangda
> >>>
> >>> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jhe@hortonworks.com> wrote:
> >>>
> >>> > +1,
> >>> >
> >>> > Carlo and Subru,  great job !  thanks for your contribution !
> >>> > I reviewed a couple of CapacityScheduler related patches, they are
in
> >>> good
> >>> > shape. In the minimum, they are not affecting existing behavior.
> >>>should
> >>> be
> >>> > safe to merge.
> >>> >
> >>> > Jian
> >>> >
> >>> >
> >>> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut
> >>><tjungblut@apache.org>
> >>> > wrote:
> >>> >
> >>> > > +1 (non-binding)
> >>> > > Thanks for adding this, really useful feature.
> >>> > >
> >>> > > On 30 September 2014 19:40, Chris Douglas <cdouglas@apache.org>
> >>> wrote:
> >>> > >
> >>> > > > +1
> >>> > > >
> >>> > > > Excellent work, Carlo and Subru. -C
> >>> > > >
> >>> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
> >>> ccurino@microsoft.com>
> >>> > > > wrote:
> >>> > > > > (Apologies if it is delivered twice.)
> >>> > > > >
> >>> > > > > YARN Devs,
> >>> > > > >
> >>> > > > > We propose to merge YARN-1051 development branch into
trunk.
> >>> > > > >
> >>> > > > > Key Idea:
> >>> > > > > This work adds support for Reservations to YARN RM.
The key
> >>>idea
> >>> is
> >>> > to
> >>> > > > allow users to request dedicated access to resources (a
> >>> reservation),
> >>> > > ahead
> >>> > > > of time.
> >>> > > > > For example I can ask for "10 containers for 1 hour
sometime
> >>> between
> >>> > > 4pm
> >>> > > > and 9pm today".  The RM keeps track of the accepted reservation
> >>>by
> >>> > means
> >>> > > of
> >>> > > > > a Plan (think it as an agenda on how the  cluster resources
> >>>will
> >>> be
> >>> > > > used), and performs admission control to guarantee that if
a
> >>> > reservation
> >>> > > is
> >>> > > > accepted enough
> >>> > > > > resources are set aside to satisfy it.  We enforce the
> >>>reservation
> >>> > > > promises by dynamically creating/resizing/removing queues
at the
> >>> right
> >>> > > > time. This allows us
> >>> > > > > to leverage the existing schedulers for the actual container
> >>> > assignment
> >>> > > > and tracking. The key benefit is to expose to the scheduler
> >>> flexibility
> >>> > > of
> >>> > > > allocation, while
> >>> > > > > guaranteeing users predictable resource allocation.
> >>> > > > >
> >>> > > > > Status
> >>> > > > >
> >>> > > > > *         The work has been "broken down" into 14 subtasks
(+3
> >>> > patches
> >>> > > > already committed to trunk for move/kill of apps). All the
issues
> >>> have
> >>> > > been
> >>> > > > resolved.
> >>> > > > >
> >>> > > > > *         Jenkins +1 the patch (with the exception of
one test
> >>> > failure
> >>> > > > which we did not introduce, which is tracked here:
> >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> >>> > > > >
> >>> > > > > *         Simple integration with MapReduce:
> >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> >>> > > > >
> >>> > > > > *         The broken-down patches have been reviewed
and +1ed
> >>>by
> >>> > Vinod
> >>> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla,
and
> >>>Chris
> >>> > > Douglas.
> >>> > > > Thanks to all of you for the thorough reviews!
> >>> > > > >
> >>> > > > > *         The current version has been rather thoroughly
> >>>tested by
> >>> > > > running it on our 250 machines research cluster for months
(first
> >>> > > prototype
> >>> > > > was operational about a year ago) by:
> >>> > > > >
> >>> > > > > o   Running hundreds of thousands of job generate by
a modified
> >>> > version
> >>> > > > of gridmix that exercise the reservations mechanism side-by-side
> >>> normal
> >>> > > > queues.
> >>> > > > >
> >>> > > > > o   To support our integration with the resource estimation
> >>> framework
> >>> > > > Perforator (
> >>> http://research.microsoft.com/pubs/178971/perforator.pdf).
> >>> > > > Kaushik and Dharmesh have been pounding the reservation system
> >>>for
> >>> > their
> >>> > > > research for 3-4 months now, and helped us spot few bugs
and iron
> >>> them
> >>> > > out.
> >>> > > > >
> >>> > > > > o   Code has been inspected/extended by 4-5 other researchers
> >>> which
> >>> > are
> >>> > > > exploring integration with other systems and extensions of
our
> >>> > algorithms
> >>> > > > for "reservation placement".
> >>> > > > >
> >>> > > > > *         We have few ideas for follow-up
> >>>extensions/improvements
> >>> are
> >>> > > > tracked by the umbrella JIRA
> >>> > > > https://issues.apache.org/jira/browse/YARN-2572
> >>> > > > >
> >>> > > > > Documents and Deliverables
> >>> > > > >
> >>> > > > > *         This work was accepted for publication to
SoCC 2014
> >>> > > > (pre-camera ready version of the paper here):
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15
> >>>.pdf
> >>> > > > >
> >>> > > > > *         Shorter design doc:
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-desi
> >>>gn.pdf
> >>> > > > >
> >>> > > > > *         Overall patch:
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.pa
> >>>tch
> >>> > > > >
> >>> > > > > *         Per Karthik request we are preparing a small
how-to
> >>> > document
> >>> > > > and example code/configuration tracked by
> >>> > > > https://issues.apache.org/jira/browse/YARN-2609
> >>> > > > >
> >>> > > > >
> >>> > > > > Credits
> >>> > > > > Myself and Subru did lots of the coding (hence the flow
of
> >>>patches
> >>> > from
> >>> > > > us), but this is a group effort that could have not been
possible
> >>> > without
> >>> > > > the ideas and hard work of many other
> >>> > > > > folks in our research group (Microsoft-CISL). Major
kudos to:
> >>> Chris
> >>> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
> >>> > Difallah.
> >>> > > > Also big thanks to the many folks in community  (Arun, Vinod,
> >>> > Alejandro,
> >>> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason,
> >>> Bobby,
> >>> > and
> >>> > > > many more) that helped us shape our ideas and code with very
> >>> insightful
> >>> > > > feedback and comments.
> >>> > > > >
> >>> > > > > We expect the vote to run for the usual 7 days and will
expire
> >>>at
> >>> > 12pm
> >>> > > > PDT on Oct 3. Please feel free to reach out to us if you
have any
> >>> > > > questions/doubts.
> >>> > > > >
> >>> > > > > Cheers,
> >>> > > > > Carlo & Subru
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>> > --
> >>> > CONFIDENTIALITY NOTICE
> >>> > NOTICE: This message is intended for the use of the individual or
> >>> entity to
> >>> > which it is addressed and may contain information that is
> >>>confidential,
> >>> > privileged and exempt from disclosure under applicable law. If the
> >>> reader
> >>> > of this message is not the intended recipient, you are hereby
> >>>notified
> >>> that
> >>> > any printing, copying, dissemination, distribution, disclosure or
> >>> > forwarding of this communication is strictly prohibited. If you have
> >>> > received this communication in error, please contact the sender
> >>> immediately
> >>> > and delete it from your system. Thank You.
> >>> >
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message