apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Support for Anti-Affinity in Apex
Date Thu, 21 Jan 2016 04:59:00 GMT
https://issues.apache.org/jira/browse/SLIDER-82


On Wed, Jan 20, 2016 at 8:56 PM, Thomas Weise <thomas@datatorrent.com>
wrote:

> The point was that containers are taken away from other apps that may have
> to discard work etc. It's not good style to claim resources and not use
> them eventually :-)
>
> For this feature it is necessary to look at the scheduler
> capabilities/semantics and limitations. For example, don't bet exclusively
> on node requests if the goal is for it to work with FairScheduler.
>
> Also look at Slider, which just recently added support for anti-affinity
> (using node requests). When you run it on the CDH cluster, it probably
> won't work...
>
>
> On Wed, Jan 20, 2016 at 3:19 PM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
>> Once released won't the containers be available again in the pool. This
>> would only be optional and not mandatory.
>>
>> Thanks
>>
>> On Tue, Jan 19, 2016 at 2:02 PM, Thomas Weise <thomas@datatorrent.com>
>> wrote:
>>
>> > How about also supporting a minor variation of it as an option
>> > > where it greedily gets the total number of containers and discards
>> ones
>> > it
>> > > can't use and repeats the process for the remaining till everything
>> has
>> > > been allocated.
>> >
>> >
>> > This is problematic as with resource preemption these containers will be
>> > potentially taken away from other applications and then thrown away.
>> >
>> >
>> >
>> >
>> > > Also does it make sense to support anti-cluster affinity?
>> > >
>> > > Thanks
>> > >
>> > > On Tue, Jan 19, 2016 at 1:21 PM, Isha Arkatkar <isha@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > >    We want add support for Anti-affinity in Apex to allow
>> applications
>> > to
>> > > > launch specific physical operators on different nodes(APEXCORE-10
>> > > > <https://issues.apache.org/jira/browse/APEXCORE-10>). Want to
>> request
>> > > your
>> > > > suggestions/ideas for the same!
>> > > >
>> > > >   The reasons for using anti-affinity in operators could be: to
>> ensure
>> > > > reliability, for performance reasons (such as application may not
>> want
>> > 2
>> > > > i/o intensive operators to land on the same node to improve
>> > performance)
>> > > or
>> > > > for some application specific constraints(for example,  2 partitions
>> > > cannot
>> > > > be run on the same node since they use same port number). This is
>> the
>> > > > general rationale for adding Anti-affinity support.
>> > > >
>> > > > Since, Yarn does not support anti-affinity yet (YARN-1042
>> > > > <https://issues.apache.org/jira/browse/YARN-1042>), we need
to
>> > implement
>> > > > the logic in AM. Wanted to get your views on following aspects for
>> this
>> > > > implementation:
>> > > >
>> > > > *1. How to specify anti-affinity for physical operators/partitions
>> in
>> > > > application:*
>> > > >     One way for this is to have an attribute for setting
>> anti-affinity
>> > at
>> > > > the logical operator context. And an operator can set this attribute
>> > with
>> > > > list of operator names which should not be collocated.
>> > > >      Consider dag with 3 operators:
>> > > >      TestOperator o1 = dag.addOperator("O1", new TestOperator());
>> > > >      TestOperator o2 = dag.addOperator("O2", new TestOperator());
>> > > >      TestOperator o3 = dag.addOperator("O3", new TestOperator());
>> > > >
>> > > >  To set anti-affinity for O1 operator:
>> > > >     dag.setAttribute(o1, OperatorContext.ANTI_AFFINITY, new
>> > > > ArrayList<String>(Arrays.asList("O2", "O3")));
>> > > >      This would mean O1 should not be allocated on nodes containing
>> > > > operators O2 and O3. This applies to all allocated partitions of O1,
>> > O2,
>> > > > O3.
>> > > >
>> > > >    Also, if same operator name is part of anti-affinity list, it
>> means
>> > > > partitions of the operator should not be allocated on the same node.
>> > > > example:
>> > > >     dag.setAttribute(o2, OperatorContext.ANTI_AFFINITY, new
>> > > > ArrayList<String>(Arrays.asList("O2")));
>> > > >     This indicates anti-affinity between all partitions of O2. i.e.
>> all
>> > > > partitions of O2 should be launched on different nodes.
>> > > >
>> > > >    Based on the anti-affinity attribute specified for logical
>> operator,
>> > > > during physical plan creation, we can add this list to each
>> > PTContainer.
>> > > > This in turn will be available for Stram for sending container
>> requests
>> > > > accordingly.
>> > > >
>> > > >    Please suggest if there is a better way to express this intent.
>> > > >
>> > > > *2. How to implement anti-affinity in AM*
>> > > >    There are 2 ways we can implement this:
>> > > >   * a. Blacklisting of nodes: *We can group the physical container
>> > > requests
>> > > > based on anti-affinity requirements and send allocation requests for
>> > > > containers in groups. After first group is done, blacklist the nodes
>> > > before
>> > > > sending second group of container requests. This will ensure that
>> the
>> > > > containers with anti-affinity requirements  will be allocated on
>> > > different
>> > > > nodes.
>> > > > *   b. Node specific container request: *Explore and create a map
of
>> > > nodes
>> > > > present in the cluster and send allocation request for container on
>> a
>> > > > specific node, honoring anti-affinity. There are couple of open Yarn
>> > > Jiras
>> > > > for node specific container requests: YARN-1412
>> > > > <https://issues.apache.org/jira/browse/YARN-1412>, YARN-2027
>> > > > <https://issues.apache.org/jira/browse/YARN-2027>. So, need
to
>> check
>> > if
>> > > > this is a plausible approach.
>> > > >
>> > > > *3. Strict Vs Relaxed anti-affinity*
>> > > >   Depending on cluster resources availability, it may not be
>> possible
>> > to
>> > > > honor all anti-affinity requirements specified.
>> > > > *Strict Anti-affinity:* AM will keep trying to allocate containers
>> as
>> > per
>> > > > anti-affinity requirements indefinitely. This behavior will be
>> similar
>> > to
>> > > > how an application shows in ACCEPTED state, till resources are
>> > available
>> > > to
>> > > > launch in cluster.
>> > > > *Relaxed Anti-affinity:* AM will drop the anti-affinity constraint
>> > after
>> > > a
>> > > > certain timeout.
>> > > >
>> > > > We need a way to set this attribute through application. (Either in
>> > > > operator context or in DAGContext for application wide setting.)
>> > > >
>> > > > *4. How do we unit test this feature*
>> > > >   We could use Mockito for mocking Yarn behaviors and test only AM
>> > > > implementation, since it may not be easy to simulate some scenarios
>> > > > manually in cluster. Please suggest if there are better ways to test
>> > > this.
>> > > >
>> > > > Please suggest improvements or any other ideas on all of the above.
>> > > >
>> > > > Thanks!
>> > > > Isha
>> > > >
>> > > > P.S. Sorry for long email. Please let me know if I should start
>> > separate
>> > > > threads for any of the above points.
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message