flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Hogan <c...@greghogan.com>
Subject Duplicate sort keys
Date Mon, 03 Oct 2016 14:04:17 GMT
Is it correct to expect that Flink should remove duplicate sort keys? I'm
working on instrumenting the FixedLengthRecordSorter (FLINK-4705) and the
following test case from TypeHintITCase:200 is having an unexpected effect
due to the keyPositions = {0, 0} being passed to TupleComparator.

DataSet<Integer> resultDs = ds
      .groupBy(0)
      .sortGroup(0, Order.ASCENDING)
      .reduceGroup(new GroupReducer<Tuple3<Integer, Long, String>, Integer>())
      .returns(BasicTypeInfo.INT_TYPE_INFO);

The sortGroup will have no affect since only one key is presented to the
UDF at a time. Flink also makes no guarantees as to the order in which keys
are presented to the UDF, which are sorted per partition. I would also
expect repeat keys in groupBy to be ignored.

Greg

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message