flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhue...@apache.org
Subject [2/3] flink git commit: [FLINK-3649] [docs] Add documentation for DataSet minBy / maxBy.
Date Fri, 17 Jun 2016 08:36:51 GMT
[FLINK-3649] [docs] Add documentation for DataSet minBy / maxBy.

This closes #2104


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/298c0092
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/298c0092
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/298c0092

Branch: refs/heads/master
Commit: 298c009202ebf0fcf14d747c68124447b06b796d
Parents: 7cc6943
Author: Fabian Hueske <fhueske@apache.org>
Authored: Wed Jun 15 12:09:44 2016 +0200
Committer: Fabian Hueske <fhueske@apache.org>
Committed: Fri Jun 17 00:16:41 2016 +0200

----------------------------------------------------------------------
 docs/apis/batch/dataset_transformations.md | 72 ++++++++++++++++++++++++-
 docs/apis/batch/index.md                   | 43 +++++++++++++++
 2 files changed, 114 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/298c0092/docs/apis/batch/dataset_transformations.md
----------------------------------------------------------------------
diff --git a/docs/apis/batch/dataset_transformations.md b/docs/apis/batch/dataset_transformations.md
index 0de771a..8e65389 100644
--- a/docs/apis/batch/dataset_transformations.md
+++ b/docs/apis/batch/dataset_transformations.md
@@ -213,7 +213,7 @@ val naturalNumbers = intNumbers.filter { _ > 0 }
 **IMPORTANT:** The system assumes that the function does not modify the elements on which
the predicate is applied. Violating this assumption
 can lead to incorrect results.
 
-### Project (Tuple DataSets only) (Java/Python API Only)
+### Projection of Tuple DataSet
 
 The Project transformation removes or moves Tuple fields of a Tuple DataSet.
 The `project(int...)` method selects Tuple fields that should be retained by their index
and defines their order in the output Tuple.
@@ -884,6 +884,42 @@ In contrast to that `.aggregate(SUM, 0).aggregate(MIN, 2)` will apply
an aggrega
 
 **Note:** The set of aggregation functions will be extended in the future.
 
+### MinBy / MaxBy on Grouped Tuple DataSet
+
+The MinBy (MaxBy) transformation selects a single tuple for each group of tuples. The selected
tuple is the tuple whose values of one or more specified fields are minimum (maximum). The
fields which are used for comparison must be valid key fields, i.e., comparable. If multiple
tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
+
+The following code shows how to select the tuple with the minimum values for the `Integer`
and `Double` fields for each group of tuples with the same `String` value from a `DataSet<Tuple3<Integer,
String, Double>>`:
+
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+
+~~~java
+DataSet<Tuple3<Integer, String, Double>> input = // [...]
+DataSet<Tuple3<Integer, String, Double>> output = input
+                                   .groupBy(1)   // group DataSet on second field
+                                   .minBy(0, 2); // select tuple with minimum values for
first and third field.
+~~~
+
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val input: DataSet[(Int, String, Double)] = // [...]
+val output: DataSet[(Int, String, Double)] = input
+                                   .groupBy(1)  // group DataSet on second field
+                                   .minBy(0, 2) // select tuple with minimum values for first
and third field.
+~~~
+
+</div>
+<div data-lang="python" markdown="1">
+
+~~~python
+Not supported.
+~~~
+
+</div>
+</div>
+
 ### Reduce on full DataSet
 
 The Reduce transformation applies a user-defined reduce function to all elements of a DataSet.
@@ -1018,6 +1054,40 @@ Not supported.
 
 **Note:** Extending the set of supported aggregation functions is on our roadmap.
 
+### MinBy / MaxBy on full Tuple DataSet
+
+The MinBy (MaxBy) transformation selects a single tuple from a DataSet of tuples. The selected
tuple is the tuple whose values of one or more specified fields are minimum (maximum). The
fields which are used for comparison must be valid key fields, i.e., comparable. If multiple
tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
+
+The following code shows how to select the tuple with the maximum values for the `Integer`
and `Double` fields from a `DataSet<Tuple3<Integer, String, Double>>`:
+
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+
+~~~java
+DataSet<Tuple3<Integer, String, Double>> input = // [...]
+DataSet<Tuple3<Integer, String, Double>> output = input
+                                   .maxBy(0, 2); // select tuple with maximum values for
first and third field.
+~~~
+
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val input: DataSet[(Int, String, Double)] = // [...]
+val output: DataSet[(Int, String, Double)] = input                          
+                                   .maxBy(0, 2) // select tuple with maximum values for first
and third field.
+~~~
+
+</div>
+<div data-lang="python" markdown="1">
+
+~~~python
+Not supported.
+~~~
+
+</div>
+</div>
+
 ### Distinct
 
 The Distinct transformation computes the DataSet of the distinct elements of the source DataSet.

http://git-wip-us.apache.org/repos/asf/flink/blob/298c0092/docs/apis/batch/index.md
----------------------------------------------------------------------
diff --git a/docs/apis/batch/index.md b/docs/apis/batch/index.md
index 2f7013f..993fb72 100644
--- a/docs/apis/batch/index.md
+++ b/docs/apis/batch/index.md
@@ -463,6 +463,20 @@ DataSet<Tuple2<String, Integer>> out = in.project(2,0);
 {% endhighlight %}
       </td>
     </tr>
+    <tr>
+      <td><strong>MinBy / MaxBy</strong></td>
+      <td>
+        <p>Selects a tuple from a group of tuples whose values of one or more fields
are minimum (maximum). The fields which are used for comparison must be valid key fields,
i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple
of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped
data set.</p>
+{% highlight java %}
+DataSet<Tuple3<Integer, Double, String>> in = // [...]
+// a DataSet with a single tuple with minimum values for the Integer and String fields.
+DataSet<Tuple3<Integer, Double, String>> out = in.minBy(0, 2);
+// a DataSet with one tuple for each group with the minimum value for the Double field.
+DataSet<Tuple3<Integer, Double, String>> out2 = in.groupBy(2)
+                                                  .minBy(1);
+{% endhighlight %}
+      </td>
+    </tr>
   </tbody>
 </table>
 
@@ -728,6 +742,35 @@ val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
   </tbody>
 </table>
 
+----------
+
+The following transformations are available on data sets of Tuples:
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Transformation</th>
+      <th class="text-center">Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><strong>MinBy / MaxBy</strong></td>
+      <td>
+        <p>Selects a tuple from a group of tuples whose values of one or more fields
are minimum (maximum). The fields which are used for comparison must be valid key fields,
i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple
of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped
data set.</p>
+{% highlight java %}
+val in: DataSet[(Int, Double, String)] = // [...]
+// a data set with a single tuple with minimum values for the Int and String fields.
+val out: DataSet[(Int, Double, String)] = in.minBy(0, 2)
+// a data set with one tuple for each group with the minimum value for the Double field.
+val out2: DataSet[(Int, Double, String)] = in.groupBy(2)
+                                             .minBy(1)
+{% endhighlight %}
+      </td>
+    </tr>
+  </tbody>
+</table>
+
 Extraction from tuples, case classes and collections via anonymous pattern matching, like
the following:
 {% highlight scala %}
 val data: DataSet[(Int, String, Double)] = // [...]


Mime
View raw message