flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API
Date Wed, 15 Jun 2016 18:22:09 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332250#comment-15332250
] 

ASF GitHub Bot commented on FLINK-3650:
---------------------------------------

Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1856#discussion_r67216772
  
    --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
    @@ -699,6 +700,55 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
       }
     
       /**
    +    * Selects an element with minimum value.
    +    *
    +    * The minimum is computed over the specified fields in lexicographical order.
    +    *
    +    * Example 1: Given a data set with elements [0, 1], [1, 0], the
    +    * results will be:
    +    *
    +    * minBy(0)[0, 1]
    +    * minBy(1)[1, 0]
    +    * Example 2: Given a data set with elements [0, 0], [0, 1], the
    +    * results will be:
    +    * minBy(0, 1)[0, 0]
    +    * If multiple values with minimum value at the specified fields exist, a random one
will be
    +    * picked.
    +    * Internally, this operation is implemented as a {@link ReduceFunction}.
    +    */
    +  def minBy(fields: Int*) : DataSet[T]  = {
    +    if (!getType.isTupleType) {
    +      throw new InvalidProgramException("DataSet#minBy(int...) only works on Tuple types.")
    +    }
    +
    +    reduce(new SelectByMinFunction[T](getType.asInstanceOf[TupleTypeInfoBase[T]], fields.toArray))
    +  }
    +
    +  /**
    +    * Selects an element with maximum value.
    +    *
    +    * The maximum is computed over the specified fields in lexicographical order.
    +    *
    +    * Example 1: Given a data set with elements [0, 1], [1, 0], the
    +    * results will be:
    +    *
    +    * maxBy(0)[1, 0]
    +    * maxBy(1)[0, 1]
    +    * Example 2: Given a data set with elements [0, 0], [0, 1], the
    +    * results will be:
    +    * maxBy(0, 1)[0, 1]
    +    * If multiple values with maximum value at the specified fields exist, a random one
will be
    +    * picked
    +    * Internally, this operation is implemented as a {@link ReduceFunction}.
    +    *
    +    */
    +  def maxBy(fields: Int*) : DataSet[T] = {
    +    if (!getType.isTupleType) {
    +      throw new InvalidProgramException("DataSet#maxBy(int...) only works on Tuple types.")
    +    }
    +    reduce(new SelectByMaxFunction[T](getType.asInstanceOf[TupleTypeInfoBase[T]], fields.toArray))
    +  }
    --- End diff --
    
    This is very sharp eyes :)


> Add maxBy/minBy to Scala DataSet API
> ------------------------------------
>
>                 Key: FLINK-3650
>                 URL: https://issues.apache.org/jira/browse/FLINK-3650
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. These methods
are not supported by the Scala DataSet API. These methods should be added in order to have
a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message