flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: Union a data stream with a product of itself
Date Wed, 25 Nov 2015 15:13:09 GMT
Here's the issue: https://issues.apache.org/jira/browse/FLINK-3080

-V.

On 25 November 2015 at 14:38, Gyula Fóra <gyula.fora@gmail.com> wrote:

> Yes, please
>
> Vasiliki Kalavri <vasilikikalavri@gmail.com> ezt írta (időpont: 2015. nov.
> 25., Sze, 14:37):
>
> > So, do we all agree that the current behavior is not correct? Shall I
> open
> > a JIRA about this?
> >
> > On 25 November 2015 at 13:58, Gyula Fóra <gyula.fora@gmail.com> wrote:
> >
> > > Well it kind of depends on what definition of union are we using. If
> this
> > > is a union in a set theoretical way we can argue that the union of a
> > stream
> > > with itself should be the same stream because it contains exactly the
> > same
> > > elements with the same timestamps and lineage.
> > >
> > > On the other hand stream and stream.map(id) are not exactly the same as
> > > they might have elements with different order (the lineage differs).
> > >
> > > So I wouldnt say that any self-union semantics is the only possible
> one.
> > >
> > > Gyula
> > >
> > > Bruecke, Christoph <christoph.bruecke@campus.tu-berlin.de> ezt írta
> > > (időpont: 2015. nov. 25., Sze, 13:47):
> > >
> > > > Hi,
> > > >
> > > > the operation “stream.union(stream.map(id))” is equivalent to
> > > > “stream.union(stream)” isn’t it? So it might also duplicate the
data.
> > > >
> > > > - Christoph
> > > >
> > > >
> > > > > On 25 Nov 2015, at 11:24, Stephan Ewen <sewen@apache.org> wrote:
> > > > >
> > > > > "stream.union(stream.map(..))" should definitely be possible. Not
> > sure
> > > > why
> > > > > this is not permitted.
> > > > >
> > > > > "stream.union(stream)" would contain each element twice, so should
> > > either
> > > > > give an error or actually union (or duplicate) elements...
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyfora@apache.org>
> > > wrote:
> > > > >
> > > > >> Yes, I am not sure if this the intentional behaviour. I think
you
> > are
> > > > >> supposed to be able to do the things you described.
> > > > >>
> > > > >> stream.union(stream.map(..)) and things like this are fair
> > operations.
> > > > Also
> > > > >> maybe stream.union(stream) should just give stream instead of
an
> > > error.
> > > > >>
> > > > >> Could someone comment on this who knows the reasoning behind
the
> > > current
> > > > >> mechanics?
> > > > >>
> > > > >> Gyula
> > > > >>
> > > > >> Vasiliki Kalavri <vasilikikalavri@gmail.com> ezt írta
(időpont:
> > 2015.
> > > > nov.
> > > > >> 24., K, 16:46):
> > > > >>
> > > > >>> Hi squirrels,
> > > > >>>
> > > > >>> when porting the gelly streaming code from 0.9 to 0.10 today
with
> > > > Paris,
> > > > >> we
> > > > >>> hit an exception in union: "*A DataStream cannot be unioned
with
> > > > >> itself*".
> > > > >>>
> > > > >>> The code raising this exception looks like this:
> > > > >>> stream.union(stream.map(...)).
> > > > >>>
> > > > >>> Taking a look into the union code, we see that it's now not
> allowed
> > > to
> > > > >>> union a stream, not only with itself, but with any product
of
> > itself.
> > > > >>>
> > > > >>> First, we are wondering, why is that? Does it make building
the
> > > stream
> > > > >>> graph easier in some way?
> > > > >>> Second, we might want to give a better error message there,
e.g.
> > "*A
> > > > >>> DataStream cannot be unioned with itself or a product of
> itself*",
> > > and
> > > > >>> finally, we should update the docs, which currently state
that
> > union
> > > a
> > > > >>> stream with itself is allowed and that "*If you union a data
> stream
> > > > with
> > > > >>> itself you will still only get each element once.*"
> > > > >>>
> > > > >>> Cheers,
> > > > >>> -Vasia.
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message