flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Count of Grouped DataSet
Date Mon, 02 May 2016 09:56:46 GMT
Hi Nirmalya,

the solution with List.size() won't use a combiner and won't be efficient
for large data sets with large groups.
I would recommend to add a 1 and use GroupedDataSet.sum().

2016-05-01 12:48 GMT+02:00 nsengupta <sengupta.nirmalya@gmail.com>:

> Hello all,
>
> This is how I have moved ahead with the implementation of finding count of
> a
> GroupedDataSet:
>
> *val k = envDefault
>       .fromElements((1,1,2,"A"),(1,1,2,"B"),(2,1,3,"B"),(3,1,4,"C"))
>       .groupBy(1,2)
>       .reduceGroup(nextGroup => {
>             val asList = nextGroup.toList
>         (asList.head._2,asList.head._3,asList.size)
>       })
>
>     k.print()*
>
> While this produces the expected output alright, I am not sure if this the
> ideal, idiomatic way to implement what I need. Could you please confirm? If
> there is a better way, I would like to be wiser of course.
>
> -- Nirmalya
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592p6594.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Mime
View raw message