flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-925) Support KeySelector function returning Tuples
Date Fri, 27 Jun 2014 09:19:24 GMT

    [ https://issues.apache.org/jira/browse/FLINK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045743#comment-14045743

Tobias commented on FLINK-925:

DataSet implements: 
public Grouping<T> groupBy(int... fields) {
		return new Grouping<T>(this, new Keys.FieldPositionKeys<T>(fields, getType(),
That can be used to group not comparable Tuple data types. Those Tuples need consist of non
generic comparable types.

When I group on my comparable:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(0)*
This exception is thrown:
Exception in thread "main" java.lang.UnsupportedOperationException: Generic type comparators
are not yet implemented.
	at eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66)

When I group on the Integer:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(1)*
Exception in thread "main" eu.stratosphere.compiler.CompilerException: Error translating node
'GroupReduce "MAX(1)" : SORTED_GROUP_REDUCE [[ GlobalProperties [partitioning=RANDOM] ]] [[
LocalProperties [ordering=null, grouped=null, unique=null] ]]': Could not serialize comparator
into the configuration.

Grouping with: *class MyComparable implements Comparable<MyComparable>*
{color:red}Exception in thread "main" java.lang.UnsupportedOperationException: Generic type
comparators are not yet implemented.
	at eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66){color}

I did those test in order to understand the problem. As far as I understand:
-> Tuple data types can be grouped when they contain non generic types
-> All other generic types are not group-able. In a Tuple or not.
-> Tuples which contain one generic type are not group-able independent on the KEY used
for grouping

Does it make sense to remove the Comparable restriction? Because even some classes which do
fulfill that restriction are not supported?!
And Tuple can be grouped if they consist of the right types.

> Support KeySelector function returning Tuples
> ---------------------------------------------
>                 Key: FLINK-925
>                 URL: https://issues.apache.org/jira/browse/FLINK-925
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 0.6-incubating
>            Reporter: Fabian Hueske
>            Assignee: Tobias
>            Priority: Minor
>              Labels: starter
> KeySelector functions are used to extract keys on which DataSets can be grouped or joined.
> Currently, the keys types returned by KeySelector function are restricted to be comparable.
However, Flinks Tuple data types are not comparable (because this depends on the types of
its fields) which makes grouping and joining on composite keys difficult.
> We should change the signature of the groupBy(), join(), and coGroup() methods to allow
also non-comparable keys as return types of a KeySelector function. 
> Instead we will check at optimization time whether the returned type is comparable (which
is true for tuples if all elements are comparable).

This message was sent by Atlassian JIRA

View raw message