apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: Support easy parallelization of pipelines
Date Thu, 24 Sep 2015 22:06:13 GMT
Hi Ilya

I am not sure what you mean by pass only the "first" tuple through. Can you
explain?

A good place to start looking at setting partitioning keys to control
whether the data sent to partitions is replicated is here.
https://github.com/apache/incubator-apex-core/blob/devel-3/common/src/main/java/com/datatorrent/common/partitioner/StatelessPartitioner.java

It's been some time since I've worked with this myself so it would take me
some time to create an example. If you can wait a few days I'll create one
when I get the chance, otherwise maybe someone else on the dev list can
help out.

Thanks,
Tim

On Thu, Sep 24, 2015 at 1:51 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com>
wrote:

> Timothy - is there any chance you could share a code snippet or point to
> an example showing how to do this partitioning and comment on whether
> there’s an identifier we could use to identify and only pass along the
> “first” tuple through?
>
> Thanks in advance.
> On 9/23/15, 12:08 PM, "Timothy Farkas" <tim@datatorrent.com> wrote:
>
> >That's a good point. Using the dag.addStream(“NAME”, A.output, B.input,
> >C.input); approach the B and C pipelines would be independent, so if B
> >failed C would be unaffected. But If A failed both pipelines would be
> >affected. In the case of setting a custom partitioner the same applies. A
> >failure in one of the B pipelines would leave the other pipelines
> >unaffected, but a failure in A would affect all the pipelines.
> >
> >Tim
> >
> >On Wed, Sep 23, 2015 at 7:54 AM, Ganelin, Ilya
> ><Ilya.Ganelin@capitalone.com>
> >wrote:
> >
> >> Thanks, Tim - I think I understand. You are proposing to do things at
> >>the
> >> operator level, rather than at the DAG level.
> >>
> >> B could then have 6 parallel partitions, some of which process the same
> >> data simultaneously.
> >> X1
> >> X1
> >> X2
> >> X2
> >> X3
> >> X3
> >>
> >> In this scenario, is it possible for the OPERATOR to fail terminating
> >>all
> >> the pipelines or is it only possible for an individual physical pipeline
> >> to fail - therefore not affecting the others?
> >>
> >> On 9/23/15, 11:48 AM, "Timothy Farkas" <tim@datatorrent.com> wrote:
> >>
> >> >Hi Ilya,
> >> >
> >> >There are two ways to do this. You can do normal N x N partitioning,
> >>but
> >> >write a partitioner for B which assigns the same partition keys to M
> >> >operators in a partitioning. Then M partitions will receive the same
> >>data:
> >> >An example of setting partition keys in a partitioner is here:
> >> >
> >> >
> >>
> >>
> https://github.com/apache/incubator-apex-core/blob/devel-3/common/src/mai
> >>n
> >> >/java/com/datatorrent/common/partitioner/StatelessPartitioner.java
> >> >
> >> >An easier way is this:
> >> >
> >> >dag.addStream(“NAME”, A.output, B.input, C.input);
> >> >
> >> >You have to put all the input ports you want to receive the same data
> >>in
> >> >the same stream declaration.
> >> >
> >> >Tim
> >> >
> >> >On Wed, Sep 23, 2015 at 7:39 AM, Ganelin, Ilya
> >> ><Ilya.Ganelin@capitalone.com>
> >> >wrote:
> >> >
> >> >> Ram, thank you. I think this is a good starting point, however it
> >> >>requires
> >> >> having access to the stream at creation time (as well as the operator
> >> >> being added). I¹d ideally like to create a function:
> >> >>
> >> >> static void parallelize(DAG dag);
> >> >>
> >> >> This function would take the head of the DAG, and parallelize all
> >> >> downstream operators. It looks like at the moment, there is no
> >>interface
> >> >> within DAG to provide access to its operators or streams. Does such
> >>an
> >> >> interface exist or is this something I would need to expose? Was
> >>there a
> >> >> design decision to not expose these?
> >> >>
> >> >>
> >> >> On 9/23/15, 11:26 AM, "Munagala Ramanath" <ram@datatorrent.com>
> >>wrote:
> >> >>
> >> >> >l
> >> >>
> >> >> ________________________________________________________
> >> >>
> >> >> The information contained in this e-mail is confidential and/or
> >> >> proprietary to Capital One and/or its affiliates and may only be used
> >> >> solely in performance of work or services for Capital One. The
> >> >>information
> >> >> transmitted herewith is intended only for use by the individual or
> >> >>entity
> >> >> to which it is addressed. If the reader of this message is not the
> >> >>intended
> >> >> recipient, you are hereby notified that any review, retransmission,
> >> >> dissemination, distribution, copying or other use of, or taking of
> >>any
> >> >> action in reliance upon this information is strictly prohibited. If
> >>you
> >> >> have received this communication in error, please contact the sender
> >>and
> >> >> delete the material from your computer.
> >> >>
> >> >> ________________________________________________________
> >> >>
> >> >> The information contained in this e-mail is confidential and/or
> >> >> proprietary to Capital One and/or its affiliates and may only be used
> >> >> solely in performance of work or services for Capital One. The
> >> >>information
> >> >> transmitted herewith is intended only for use by the individual or
> >> >>entity
> >> >> to which it is addressed. If the reader of this message is not the
> >> >>intended
> >> >> recipient, you are hereby notified that any review, retransmission,
> >> >> dissemination, distribution, copying or other use of, or taking of
> >>any
> >> >> action in reliance upon this information is strictly prohibited. If
> >>you
> >> >> have received this communication in error, please contact the sender
> >>and
> >> >> delete the material from your computer.
> >> >>
> >> >>
> >>
> >> ________________________________________________________
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates and may only be used
> >> solely in performance of work or services for Capital One. The
> >>information
> >> transmitted herewith is intended only for use by the individual or
> >>entity
> >> to which it is addressed. If the reader of this message is not the
> >>intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> >> have received this communication in error, please contact the sender and
> >> delete the material from your computer.
> >>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message