hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Kambatla <ka...@cloudera.com>
Subject Re: Calling a merge vote for YARN-1051
Date Thu, 02 Oct 2014 01:23:35 GMT
+1. Nicely done, Subru and Carlo.

I have been partially involved with the work, and have reviewed some of the
patches. With some help from Subru and documentation from Carlo (thanks!),
I was able to play with the reservation system. Verified the following:
1. Reservations can be made only for the amount of resources available for
that queue.
2. Jobs submitted against a reservation run in the corresponding
"reservation" queue, and jobs submitted to the same higher-level queue but
not against a reservation run in the corresponding "default" queue.
3. The web-ui shows the reserved resources in a queue even when there are
no apps running.

There are a few follow-up items towards feature completeness, and I am okay
with working on them post merge to trunk as planned.
1. Support for FairScheduler
2. Recover reservations on RM restart/failover
3. CLI and/or REST APIs to make reservations - this is very useful for
testing
4. Documentation in the usual apt.vm format.

Cheers!
Karthik




On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wheeleast@gmail.com> wrote:

> +1 (non-binding),
> Reviewed several patches related to scheduler side changes. As Jian
> mentioned, this will not affect existing behavior.
> Looking forward this feature will be used by more people. Thanks for Carlo
> and Subru!
>
> Thanks,
> Wangda
>
> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jhe@hortonworks.com> wrote:
>
> > +1,
> >
> > Carlo and Subru,  great job !  thanks for your contribution !
> > I reviewed a couple of CapacityScheduler related patches, they are in
> good
> > shape. In the minimum, they are not affecting existing behavior. should
> be
> > safe to merge.
> >
> > Jian
> >
> >
> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <tjungblut@apache.org>
> > wrote:
> >
> > > +1 (non-binding)
> > > Thanks for adding this, really useful feature.
> > >
> > > On 30 September 2014 19:40, Chris Douglas <cdouglas@apache.org> wrote:
> > >
> > > > +1
> > > >
> > > > Excellent work, Carlo and Subru. -C
> > > >
> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
> ccurino@microsoft.com>
> > > > wrote:
> > > > > (Apologies if it is delivered twice.)
> > > > >
> > > > > YARN Devs,
> > > > >
> > > > > We propose to merge YARN-1051 development branch into trunk.
> > > > >
> > > > > Key Idea:
> > > > > This work adds support for Reservations to YARN RM. The key idea
is
> > to
> > > > allow users to request dedicated access to resources (a reservation),
> > > ahead
> > > > of time.
> > > > > For example I can ask for "10 containers for 1 hour sometime
> between
> > > 4pm
> > > > and 9pm today".  The RM keeps track of the accepted reservation by
> > means
> > > of
> > > > > a Plan (think it as an agenda on how the  cluster resources will
be
> > > > used), and performs admission control to guarantee that if a
> > reservation
> > > is
> > > > accepted enough
> > > > > resources are set aside to satisfy it.  We enforce the reservation
> > > > promises by dynamically creating/resizing/removing queues at the
> right
> > > > time. This allows us
> > > > > to leverage the existing schedulers for the actual container
> > assignment
> > > > and tracking. The key benefit is to expose to the scheduler
> flexibility
> > > of
> > > > allocation, while
> > > > > guaranteeing users predictable resource allocation.
> > > > >
> > > > > Status
> > > > >
> > > > > *         The work has been "broken down" into 14 subtasks (+3
> > patches
> > > > already committed to trunk for move/kill of apps). All the issues
> have
> > > been
> > > > resolved.
> > > > >
> > > > > *         Jenkins +1 the patch (with the exception of one test
> > failure
> > > > which we did not introduce, which is tracked here:
> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> > > > >
> > > > > *         Simple integration with MapReduce:
> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> > > > >
> > > > > *         The broken-down patches have been reviewed and +1ed by
> > Vinod
> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris
> > > Douglas.
> > > > Thanks to all of you for the thorough reviews!
> > > > >
> > > > > *         The current version has been rather thoroughly tested by
> > > > running it on our 250 machines research cluster for months (first
> > > prototype
> > > > was operational about a year ago) by:
> > > > >
> > > > > o   Running hundreds of thousands of job generate by a modified
> > version
> > > > of gridmix that exercise the reservations mechanism side-by-side
> normal
> > > > queues.
> > > > >
> > > > > o   To support our integration with the resource estimation
> framework
> > > > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf
> ).
> > > > Kaushik and Dharmesh have been pounding the reservation system for
> > their
> > > > research for 3-4 months now, and helped us spot few bugs and iron
> them
> > > out.
> > > > >
> > > > > o   Code has been inspected/extended by 4-5 other researchers which
> > are
> > > > exploring integration with other systems and extensions of our
> > algorithms
> > > > for "reservation placement".
> > > > >
> > > > > *         We have few ideas for follow-up extensions/improvements
> are
> > > > tracked by the umbrella JIRA
> > > > https://issues.apache.org/jira/browse/YARN-2572
> > > > >
> > > > > Documents and Deliverables
> > > > >
> > > > > *         This work was accepted for publication to SoCC 2014
> > > > (pre-camera ready version of the paper here):
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
> > > > >
> > > > > *         Shorter design doc:
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
> > > > >
> > > > > *         Overall patch:
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
> > > > >
> > > > > *         Per Karthik request we are preparing a small how-to
> > document
> > > > and example code/configuration tracked by
> > > > https://issues.apache.org/jira/browse/YARN-2609
> > > > >
> > > > >
> > > > > Credits
> > > > > Myself and Subru did lots of the coding (hence the flow of patches
> > from
> > > > us), but this is a group effort that could have not been possible
> > without
> > > > the ideas and hard work of many other
> > > > > folks in our research group (Microsoft-CISL). Major kudos to:
> Chris
> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
> > Difallah.
> > > > Also big thanks to the many folks in community  (Arun, Vinod,
> > Alejandro,
> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby,
> > and
> > > > many more) that helped us shape our ideas and code with very
> insightful
> > > > feedback and comments.
> > > > >
> > > > > We expect the vote to run for the usual 7 days and will expire at
> > 12pm
> > > > PDT on Oct 3. Please feel free to reach out to us if you have any
> > > > questions/doubts.
> > > > >
> > > > > Cheers,
> > > > > Carlo & Subru
> > > > >
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message