flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] Consolidate method naming between the batch and streaming API
Date Mon, 01 Jun 2015 16:01:17 GMT
+1

Good list and choices, Marton!

On Mon, Jun 1, 2015 at 5:45 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Thanks for bringing up this point!
>
> +1 for the renaming.
> @Marton: Is this a "complete" list, i.e., did you go through both APIs or
> might there be more methods that are semantically identical but named
> differently?
>
> 2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyfora@apache.org>:
>
> > +1 for the changes proposed by Marton (before the release)
> >
> > Aljoscha Krettek <aljoscha@apache.org> ezt írta (időpont: 2015. jún. 1.,
> > H,
> > 16:32):
> >
> > > Yes, these renamings make sense. The partitionBy() is not yet in the
> > > master for streaming, though.
> > >
> > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <
> balassi.marton@gmail.com
> > >
> > > wrote:
> > > > Looking at the DataSet and DataStream APIs we have come to the
> > conclusion
> > > > with Aljoscha that there are a few methods that although providing
> the
> > > same
> > > > functionality are named differently. These are the following:
> > > >
> > > >    1.  rebalance (batch) / distribute (streaming): Rebalances the
> data
> > > sent
> > > >    to the downstream operators thus equally distributing it.
> > > >    2. partitionByHash, partitionCustom (batch) / partitionBy
> > (streaming):
> > > >    Partitioning has just recently been exposed in the streaming API
> and
> > > is not
> > > >    as refined as the batch one. The streaming partitionBy is actually
> > > >    partitionByHash.
> > > >    3. Union (batch) / merge, connect (streaming): The streaming merge
> > > does
> > > >    a union of two streams with the same type. Connect is conceptually
> > > >    different, it provides a way of sharing state between two streams
> > with
> > > >    potentially different types without mapping them to a common type
> > and
> > > then
> > > >    merging them. This saves latency and an ugly mapping. The former
> > > advantage
> > > >    can be offset by proper operator chaining, the second one would
> > > remain if
> > > >    we did not have connect.
> > > >
> > > > To consolidate the naming I would suggest the following:
> > > >
> > > >    1. Rename streaming distribute to rebalance.
> > > >    2. Rename streaming partitionBy to partitionByHash and file JIRA
> for
> > > >    custom partitioning support for streaming.
> > > >    3. Rename streaming merge to union, leave streaming connect as it
> > is.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message