flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: how to get rid of duplicate rows group by in DataStream
Date Mon, 22 Aug 2016 16:34:58 GMT
Hi Subash,

You should also split your elements in windows.
If not, Flink emits an element for each incoming record.
That is why you have:

(1,1)
(1,2)
(1,3)

…

Kostas

> On Aug 22, 2016, at 5:58 PM, subash basnet <yasubash@gmail.com> wrote:
> 
> Hello all, 
> 
> I grouped by the input based on it's id to count the number of elements in each group.

> DataStream<Tuple2<String, Long>> gridWithCount;
> Upon printing the above datastream it shows with duplicate rows:
> Output: 
> (1, 1)
> (1,2)
> (2,1)
> (1,3)
> (2,2).......
> 
> Whereas I wanted the distinct rows with final count:
> Needed Output:
> (1,3)
> (2,2)..
> 
> What could be the way to achieve this. 
> 
> 
> Regards,
> Subash Basnet


Mime
View raw message