flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas FOURNIER <thomasfournier...@gmail.com>
Subject Global Sort + ZipWithIndex
Date Sun, 13 Nov 2016 22:44:48 GMT
Hello,

I'm trying to assign a unique (and deterministic) ID to a globally sorted
DataSet.

Given a DataSet of String, I'm computing the frequency of each label as
follows:

val env = ExecutionEnvironment.getExecutionEnvironment

val data = env.fromCollection(List("a","b","c","a","a","d","a","a","a","b","b","c","a","c","b","c"))

val mapping = data.map(s => (s,1))
.groupBy(0)
  .reduce((a,b) => (a._1, a._2 + b._2))
  .partitionByRange(1)
  .sortPartition(1, Order.DESCENDING)


Then I want the most frequent label to be ID 0 and so on *in a decreasing
order*. My idea was to use zipWithIndex.

val result = mapping.zipWithIndex


But this does not guarantee that the global order will be preserved right ?
What can I do to get such mapping ?

Thanks
Regards

Thomas

Mime
View raw message