apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Support for Anti-Affinity in Apex
Date Tue, 19 Jan 2016 22:14:14 GMT
Gaurav,

Does request for specific node work on FairScheduler now?



On Tue, Jan 19, 2016 at 2:06 PM, Gaurav Gupta <gaurav@datatorrent.com>
wrote:

> I agree with Pramod that we should go with 2b and we are already doing
> node locality so you can use that feature.
> Regarding 3, do we need to support relaxed anti_affinity. Anti_affinity
> will mostly be used where user wants such segregation of operators on
> different nodes for his/her App.
>
> Thanks
> - Gaurav
>
> > On Jan 19, 2016, at 1:57 PM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
> >
> > Sorry I meant distro agnostic (without the not) in the first sentence.
> >
> > On Tue, Jan 19, 2016 at 1:57 PM, Pramod Immaneni <pramod@datatorrent.com
> >
> > wrote:
> >
> >> Isha this sounds great. 2 a. sounds like a good approach that is not
> >> distro agnostic. How about also supporting a minor variation of it as an
> >> option where it greedily gets the total number of containers and
> discards
> >> ones it can't use and repeats the process for the remaining till
> everything
> >> has been allocated. Also does it make sense to support anti-cluster
> >> affinity?
> >>
> >> Thanks
> >>
> >> On Tue, Jan 19, 2016 at 1:21 PM, Isha Arkatkar <isha@datatorrent.com>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>>   We want add support for Anti-affinity in Apex to allow applications
> to
> >>> launch specific physical operators on different nodes(APEXCORE-10
> >>> <https://issues.apache.org/jira/browse/APEXCORE-10>). Want to request
> >>> your
> >>> suggestions/ideas for the same!
> >>>
> >>>  The reasons for using anti-affinity in operators could be: to ensure
> >>> reliability, for performance reasons (such as application may not want
> 2
> >>> i/o intensive operators to land on the same node to improve
> performance)
> >>> or
> >>> for some application specific constraints(for example,  2 partitions
> >>> cannot
> >>> be run on the same node since they use same port number). This is the
> >>> general rationale for adding Anti-affinity support.
> >>>
> >>> Since, Yarn does not support anti-affinity yet (YARN-1042
> >>> <https://issues.apache.org/jira/browse/YARN-1042>), we need to
> implement
> >>> the logic in AM. Wanted to get your views on following aspects for this
> >>> implementation:
> >>>
> >>> *1. How to specify anti-affinity for physical operators/partitions in
> >>> application:*
> >>>    One way for this is to have an attribute for setting anti-affinity
> at
> >>> the logical operator context. And an operator can set this attribute
> with
> >>> list of operator names which should not be collocated.
> >>>     Consider dag with 3 operators:
> >>>     TestOperator o1 = dag.addOperator("O1", new TestOperator());
> >>>     TestOperator o2 = dag.addOperator("O2", new TestOperator());
> >>>     TestOperator o3 = dag.addOperator("O3", new TestOperator());
> >>>
> >>> To set anti-affinity for O1 operator:
> >>>    dag.setAttribute(o1, OperatorContext.ANTI_AFFINITY, new
> >>> ArrayList<String>(Arrays.asList("O2", "O3")));
> >>>     This would mean O1 should not be allocated on nodes containing
> >>> operators O2 and O3. This applies to all allocated partitions of O1,
> O2,
> >>> O3.
> >>>
> >>>   Also, if same operator name is part of anti-affinity list, it means
> >>> partitions of the operator should not be allocated on the same node.
> >>> example:
> >>>    dag.setAttribute(o2, OperatorContext.ANTI_AFFINITY, new
> >>> ArrayList<String>(Arrays.asList("O2")));
> >>>    This indicates anti-affinity between all partitions of O2. i.e. all
> >>> partitions of O2 should be launched on different nodes.
> >>>
> >>>   Based on the anti-affinity attribute specified for logical operator,
> >>> during physical plan creation, we can add this list to each
> PTContainer.
> >>> This in turn will be available for Stram for sending container requests
> >>> accordingly.
> >>>
> >>>   Please suggest if there is a better way to express this intent.
> >>>
> >>> *2. How to implement anti-affinity in AM*
> >>>   There are 2 ways we can implement this:
> >>>  * a. Blacklisting of nodes: *We can group the physical container
> >>> requests
> >>> based on anti-affinity requirements and send allocation requests for
> >>> containers in groups. After first group is done, blacklist the nodes
> >>> before
> >>> sending second group of container requests. This will ensure that the
> >>> containers with anti-affinity requirements  will be allocated on
> different
> >>> nodes.
> >>> *   b. Node specific container request: *Explore and create a map of
> nodes
> >>> present in the cluster and send allocation request for container on a
> >>> specific node, honoring anti-affinity. There are couple of open Yarn
> Jiras
> >>> for node specific container requests: YARN-1412
> >>> <https://issues.apache.org/jira/browse/YARN-1412>, YARN-2027
> >>> <https://issues.apache.org/jira/browse/YARN-2027>. So, need to check
> if
> >>> this is a plausible approach.
> >>>
> >>> *3. Strict Vs Relaxed anti-affinity*
> >>>  Depending on cluster resources availability, it may not be possible to
> >>> honor all anti-affinity requirements specified.
> >>> *Strict Anti-affinity:* AM will keep trying to allocate containers as
> per
> >>> anti-affinity requirements indefinitely. This behavior will be similar
> to
> >>> how an application shows in ACCEPTED state, till resources are
> available
> >>> to
> >>> launch in cluster.
> >>> *Relaxed Anti-affinity:* AM will drop the anti-affinity constraint
> after a
> >>> certain timeout.
> >>>
> >>> We need a way to set this attribute through application. (Either in
> >>> operator context or in DAGContext for application wide setting.)
> >>>
> >>> *4. How do we unit test this feature*
> >>>  We could use Mockito for mocking Yarn behaviors and test only AM
> >>> implementation, since it may not be easy to simulate some scenarios
> >>> manually in cluster. Please suggest if there are better ways to test
> this.
> >>>
> >>> Please suggest improvements or any other ideas on all of the above.
> >>>
> >>> Thanks!
> >>> Isha
> >>>
> >>> P.S. Sorry for long email. Please let me know if I should start
> separate
> >>> threads for any of the above points.
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message