apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: "ExcludeNodes" for an Apex application
Date Thu, 01 Dec 2016 21:01:07 GMT
I see a host locality available as an attribute in DAG for individual
operators. If affinity doesn't support this today, we could probably add
it. You could also make setting a blacklist directly a convenience function
on top of affinity.

On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde <sandesh@datatorrent.com>
wrote:

> Pramod,
>
> How to specify,  "don't deploy any operators on Node20" using
> anti-affinity?
>
> I don't see any examples here,
> http://apex.apache.org/docs/apex/application_development/#affinity-rules
>
>
> On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
> > Shouldn't this be already covered by anti-affinity. Today users can
> specify
> > multiple affinity rules, for each rule they can specify positive or
> > negative affinity, locality and operator selection. If an affinity rule
> > specifying negative affinity, node locality and all operators, does not
> > work then let's fix that scenario instead of creating a new option.
> >
> > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde <sandesh@datatorrent.com>
> > wrote:
> >
> > > I have created a jira, for adding the list of blacklisted nodes,
> > > https://issues.apache.org/jira/browse/APEXCORE-584
> > >
> > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <sanjay@datatorrent.com
> >
> > > wrote:
> > >
> > > > Yes, Ram explained to me that in practice this would be a useful
> > feature
> > > > for Apex devops who typically have no control over Hadoop/Yarn
> cluster.
> > > >
> > > > On 11/30/16, 9:22 PM, "Mohit Jotwani" <mohit@datatorrent.com> wrote:
> > > >
> > > >     This is a practical scenario where developers would be required
> to
> > > > exclude
> > > >     certain nodes as they might be required for some mission critical
> > > >     applications. It would be good to have this feature.
> > > >
> > > >     I understand that Stram should not get into resourcing and still
> > rely
> > > > on
> > > >     Yarn, however, as the App Master it should have the right to
> reject
> > > the
> > > >     nodes offered by Yarn and request for other resources.
> > > >
> > > >     Regards,
> > > >     Mohit
> > > >
> > > >     On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> > > sandesh@datatorrent.com
> > > > >
> > > >     wrote:
> > > >
> > > >     > Apex has automatic blacklisting of the troublesome nodes,
> please
> > > > take a
> > > >     > look at the following attributes,
> > > >     >
> > > >     > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > >     > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > >     > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > >     > FAILURES_FOR_BLACKLIST
> > > >     >
> > > >     > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > > >     >
> > > >     > Thanks
> > > >     >
> > > >     >
> > > >     >
> > > >     > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > > > ram@datatorrent.com>
> > > >     > wrote:
> > > >     >
> > > >     > Not sure if this is what Milind had in mind but we often run
> into
> > > >     > situations where the dev group
> > > >     > working with Apex has no control over cluster configuration --
> to
> > > > make any
> > > >     > changes to the cluster they need to
> > > >     > go through an elaborate process that can take many days.
> > > >     >
> > > >     > Meanwhile, if they notice that a particular node is
> consistently
> > > > causing
> > > >     > problems for their
> > > >     > app, having a simple way to exclude it would be very helpful
> > since
> > > > it gives
> > > >     > them a way
> > > >     > to bypass communication and process issues within their own
> > > > organization.
> > > >     >
> > > >     > Ram
> > > >     >
> > > >     > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > sanjay@datatorrent.com>
> > > >     > wrote:
> > > >     >
> > > >     > > To me both use cases appear to be generic resource management
> > use
> > > > cases.
> > > >     > > For example, a randomly rebooting node is not good for any
> > > purpose
> > > > esp.
> > > >     > > long running apps so it is a bit of a stretch to imagine
that
> > > > these nodes
> > > >     > > will be acceptable for some batch jobs in Yarn. So such
a
> node
> > > > should be
> > > >     > > marked “Bad” or Unavailable in Yarn itself.
> > > >     > >
> > > >     > > Second use case is also typical anti-affinity use case which
> > > > ideally
> > > >     > > should be implemented in Yarn – Milind’s example can
also
> apply
> > > to
> > > >     > non-Apex
> > > >     > > batch jobs. In any case it looks like Yarn still doesn’t
have
> > it
> > > (
> > > >     > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex
> > > needs
> > > > it we
> > > >     > > will need to do it ourselves.
> > > >     > >
> > > >     > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <
> > ram@datatorrent.com>
> > > > wrote:
> > > >     > >
> > > >     > >     But then, what's the solution to the 2 problem scenarios
> > that
> > > > Milind
> > > >     > >     describes ?
> > > >     > >
> > > >     > >     Ram
> > > >     > >
> > > >     > >     On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > > >     > > sanjay@datatorrent.com>
> > > >     > >     wrote:
> > > >     > >
> > > >     > >     > I think “exclude nodes” and such is really
the job of
> the
> > > > resource
> > > >     > > manager
> > > >     > >     > i.e. Yarn. So I am not sure taking over some of
these
> > tasks
> > > > in Apex
> > > >     > > would
> > > >     > >     > be very useful.
> > > >     > >     >
> > > >     > >     > I agree with Amol that apps should be node neutral.
> > > Resource
> > > >     > > management in
> > > >     > >     > Yarn together with fault tolerance in Apex should
> > minimize
> > > > the need
> > > >     > > for
> > > >     > >     > this feature although I am sure one can find use
cases.
> > > >     > >     >
> > > >     > >     >
> > > >     > >     > On 11/29/16, 10:41 PM, "Amol Kekre" <
> > amol@datatorrent.com>
> > > > wrote:
> > > >     > >     >
> > > >     > >     >     We do have this feature in Yarn, but that applies
> to
> > > all
> > > >     > > applications.
> > > >     > >     > I am
> > > >     > >     >     not sure if Yarn has anti-affinity. This feature
> may
> > be
> > > > used,
> > > >     > > but in
> > > >     > >     >     general there is danger is an application taking
> over
> > > > resource
> > > >     > >     > allocation.
> > > >     > >     >     Another quirk is that big data apps should
ideally
> be
> > > >     > > node-neutral.
> > > >     > >     > This is
> > > >     > >     >     a good idea, if we are able to carve out something
> > > where
> > > > need
> > > >     > is
> > > >     > > app
> > > >     > >     >     specific.
> > > >     > >     >
> > > >     > >     >     Thks
> > > >     > >     >     Amol
> > > >     > >     >
> > > >     > >     >
> > > >     > >     >     On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve
<
> > > >     > > milindb@gmail.com>
> > > >     > >     > wrote:
> > > >     > >     >
> > > >     > >     >     > We have seen 2 cases mentioned below,
where, it
> > would
> > > > have
> > > >     > > been nice
> > > >     > >     > if
> > > >     > >     >     > Apex allowed us to exclude a node from
the
> cluster
> > > for
> > > > an
> > > >     > >     > application.
> > > >     > >     >     >
> > > >     > >     >     > 1. A node in the cluster had gone bad
(was
> randomly
> > > >     > rebooting)
> > > >     > > and
> > > >     > >     > so an
> > > >     > >     >     > Apex app should not use it - other apps
can use
> it
> > as
> > > > they
> > > >     > were
> > > >     > >     > batch jobs.
> > > >     > >     >     > 2. A node is being used for a mission
critical
> app
> > > > (Could be
> > > >     > > an Apex
> > > >     > >     > app
> > > >     > >     >     > itself), but another Apex app which is
mission
> > > critical
> > > >     > should
> > > >     > > not
> > > >     > >     > be using
> > > >     > >     >     > resources on that node.
> > > >     > >     >     >
> > > >     > >     >     > Can we have a way in which, Stram and
YARN can
> > > > coordinate
> > > >     > > between
> > > >     > >     > each
> > > >     > >     >     > other to not use a set of nodes for the
> > application.
> > > > It an be
> > > >     > > done
> > > >     > >     > in 2 way
> > > >     > >     >     > s-
> > > >     > >     >     >
> > > >     > >     >     > 1. Have a list of "exclude" nodes with
Stram-
> when
> > > YARN
> > > >     > > allcates
> > > >     > >     > resources
> > > >     > >     >     > on either of these, STRAM rejects and
gets
> > resources
> > > >     > allocated
> > > >     > > again
> > > >     > >     > frm
> > > >     > >     >     > YARN
> > > >     > >     >     > 2. Have a list of nodes that can be used
for an
> > app -
> > > > This
> > > >     > can
> > > >     > > be a
> > > >     > >     > part of
> > > >     > >     >     > config. Hwever, I don't think this would
be a
> right
> > > > way to do
> > > >     > > so as
> > > >     > >     > we will
> > > >     > >     >     > need support from YARN as well. Further,
this
> might
> > > be
> > > >     > > difficult to
> > > >     > >     > change
> > > >     > >     >     > at runtim if need be.
> > > >     > >     >     >
> > > >     > >     >     > Any thoughts?
> > > >     > >     >     >
> > > >     > >     >     >
> > > >     > >     >     > --
> > > >     > >     >     > ~Milind bee at gee mail dot com
> > > >     > >     >     >
> > > >     > >     >
> > > >     > >     >
> > > >     > >     >
> > > >     > >     >
> > > >     > >
> > > >     > >
> > > >     > >
> > > >     > >
> > > >     >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message