flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruecke, Christoph" <christoph.brue...@campus.tu-berlin.de>
Subject Re: Union a data stream with a product of itself
Date Wed, 25 Nov 2015 12:47:19 GMT
Hi,

the operation “stream.union(stream.map(id))” is equivalent to “stream.union(stream)”
isn’t it? So it might also duplicate the data.

- Christoph


> On 25 Nov 2015, at 11:24, Stephan Ewen <sewen@apache.org> wrote:
> 
> "stream.union(stream.map(..))" should definitely be possible. Not sure why
> this is not permitted.
> 
> "stream.union(stream)" would contain each element twice, so should either
> give an error or actually union (or duplicate) elements...
> 
> Stephan
> 
> 
> On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyfora@apache.org> wrote:
> 
>> Yes, I am not sure if this the intentional behaviour. I think you are
>> supposed to be able to do the things you described.
>> 
>> stream.union(stream.map(..)) and things like this are fair operations. Also
>> maybe stream.union(stream) should just give stream instead of an error.
>> 
>> Could someone comment on this who knows the reasoning behind the current
>> mechanics?
>> 
>> Gyula
>> 
>> Vasiliki Kalavri <vasilikikalavri@gmail.com> ezt írta (időpont: 2015. nov.
>> 24., K, 16:46):
>> 
>>> Hi squirrels,
>>> 
>>> when porting the gelly streaming code from 0.9 to 0.10 today with Paris,
>> we
>>> hit an exception in union: "*A DataStream cannot be unioned with
>> itself*".
>>> 
>>> The code raising this exception looks like this:
>>> stream.union(stream.map(...)).
>>> 
>>> Taking a look into the union code, we see that it's now not allowed to
>>> union a stream, not only with itself, but with any product of itself.
>>> 
>>> First, we are wondering, why is that? Does it make building the stream
>>> graph easier in some way?
>>> Second, we might want to give a better error message there, e.g. "*A
>>> DataStream cannot be unioned with itself or a product of itself*", and
>>> finally, we should update the docs, which currently state that union a
>>> stream with itself is allowed and that "*If you union a data stream with
>>> itself you will still only get each element once.*"
>>> 
>>> Cheers,
>>> -Vasia.
>>> 
>> 

Mime
View raw message