apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: Thread Local and Container Local
Date Thu, 27 Aug 2015 20:47:48 GMT
Brennon,
Stream locality works well when there is 1:1 connection on physical plan.
For example

Logical X[1] => Y[1] => Z[1]
could be physically done as
X[2] => Y[16] => Z[16]
or
X[16] => Y[16] => Z[16]

If Z can work on same subset of Y (i.e. does not need another shuffle) Y
and Z can leverage thread, container, node locality and so forth. Node
locality that be used if resources have to be distributed and not rely of
Yarn being able to give a container that gets multiple cores or very large
memory size. On a dedicated cluster a bigger container makes more sense
than node local stream as there is no competition. Putting X => Y => Z into
a single container makes sense if you want to avoid I/O cost. But then may
as well try for X[P] => Y[P] => Z[P].

Based on resource requirements vis-a-vis container size the partitions are
determined by the bottlenecked operator (which operator X/Y/Z, which
resource, RAM/CPU/IO). In a lot of cases these are internal operators (Non
input adapters). With dedicated cluster, this equation changes a lot.

With respect to getting X->Y->Z in parallel partitiong, the first operator
(input adapter) will dictate if there will be a shuffle. If the input data
is load balanced (not key balanced), and needs same key to go to a
downstream physical partition, then a shuffle is unavoidable. If two events
on the same key can be processed by diff partitions, you may be able to
have an entire app in parallel partition as X[P] => Y[P] => Z[P], where P
is the number of partitions. Each partition then can be stream local. If
not X[P1] => Y[P2] => Z[P2] is the way out.

Thks,
Amol


On Thu, Aug 27, 2015 at 10:59 AM, York, Brennon <Brennon.York@capitalone.com
> wrote:

> 1. Does anyone have any sample code (or can point me to some) where we
> demonstrate thread local and container local operators? Answered my own
> question, check this out<
> https://github.com/apache/incubator-apex-core/blob/bdd7109519453e67789e4ec4025092a977d2b27c/engine/src/test/java/com/datatorrent/stram/stream/OiOStreamTest.java#L64
> >.
> 2. When doing thread or container local streams how does that work with
> dynamic (or differing sized) partitions between the two operators?
> Concretely, if I have a logical plan that looks like:
>
> X[1] => Y[1]
>
> And the physical plan looks like:
>
> X[2] => Y[16]
>
> How does the grouping work? Would it put 1 physical X operator and 8
> physical Y operators in one grouping and the other set of physical
> operators in another grouping or does it do something else? And, for
> edification, where does the Apex code reside that does this work?
>
> Thanks all!
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message