apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: "ExcludeNodes" for an Apex application
Date Wed, 30 Nov 2016 21:04:12 GMT
Apex has automatic blacklisting of the troublesome nodes, please take a
look at the following attributes,

MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST

BLACKLISTED_NODE_REMOVAL_TIME_MILLIS

Thanks



On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <ram@datatorrent.com>
wrote:

Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <sanjay@datatorrent.com>
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to
non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath" <ram@datatorrent.com> wrote:
>
>     But then, what's the solution to the 2 problem scenarios that Milind
>     describes ?
>
>     Ram
>
>     On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> sanjay@datatorrent.com>
>     wrote:
>
>     > I think “exclude nodes” and such is really the job of the resource
> manager
>     > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
>     > be very useful.
>     >
>     > I agree with Amol that apps should be node neutral. Resource
> management in
>     > Yarn together with fault tolerance in Apex should minimize the need
> for
>     > this feature although I am sure one can find use cases.
>     >
>     >
>     > On 11/29/16, 10:41 PM, "Amol Kekre" <amol@datatorrent.com> wrote:
>     >
>     >     We do have this feature in Yarn, but that applies to all
> applications.
>     > I am
>     >     not sure if Yarn has anti-affinity. This feature may be used,
> but in
>     >     general there is danger is an application taking over resource
>     > allocation.
>     >     Another quirk is that big data apps should ideally be
> node-neutral.
>     > This is
>     >     a good idea, if we are able to carve out something where need is
> app
>     >     specific.
>     >
>     >     Thks
>     >     Amol
>     >
>     >
>     >     On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> milindb@gmail.com>
>     > wrote:
>     >
>     >     > We have seen 2 cases mentioned below, where, it would have
> been nice
>     > if
>     >     > Apex allowed us to exclude a node from the cluster for an
>     > application.
>     >     >
>     >     > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
>     > so an
>     >     > Apex app should not use it - other apps can use it as they
were
>     > batch jobs.
>     >     > 2. A node is being used for a mission critical app (Could be
> an Apex
>     > app
>     >     > itself), but another Apex app which is mission critical should
> not
>     > be using
>     >     > resources on that node.
>     >     >
>     >     > Can we have a way in which, Stram and YARN can coordinate
> between
>     > each
>     >     > other to not use a set of nodes for the application. It an be
> done
>     > in 2 way
>     >     > s-
>     >     >
>     >     > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
>     > resources
>     >     > on either of these, STRAM rejects and gets resources allocated
> again
>     > frm
>     >     > YARN
>     >     > 2. Have a list of nodes that can be used for an app - This can
> be a
>     > part of
>     >     > config. Hwever, I don't think this would be a right way to do
> so as
>     > we will
>     >     > need support from YARN as well. Further, this might be
> difficult to
>     > change
>     >     > at runtim if need be.
>     >     >
>     >     > Any thoughts?
>     >     >
>     >     >
>     >     > --
>     >     > ~Milind bee at gee mail dot com
>     >     >
>     >
>     >
>     >
>     >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message