flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nsengupta <sengupta.nirma...@gmail.com>
Subject Count of Grouped DataSet
Date Sat, 30 Apr 2016 15:12:05 GMT
Hello Flinksters,

What is the most idiomatic way in Flink to get the count of records grouped 
by a Key (the Key can have multiple fields)?

I have referred to this  ticket
<https://issues.apache.org/jira/browse/FLINK-1269>   but because it is still
open, I can't make out what has been the final decision.

Let's say that we have following records (case class or tuple, whatever):

f1,  f2,  f3,  f4
1,   1,   2,   "A"
1,   1,   2,   "B"
2,   1,   3,   "A"
3,   1,   4,   "C"

I group this DateSet on a composite key of (f2,f3) and then, I need the
([1,2], 2)
([1,3], 1)
([1,4], 1)

I could have gone the way of accepted wisdom of /mapping/ with an extra '1'
for every key and then, /reducing/ with a /sum/ operation, but I think it is
somewhat low-level than what one is expected to do. Spark has this
/countByKey/ operator for such a purpose.

Could someone please nudge me to the right direction?

-- Nirmalya

View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

View raw message