flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aljos...@apache.org
Subject flink git commit: [FLINK-3591] Replace Quickstart K-Means Example by Streaming Example
Date Tue, 08 Mar 2016 16:15:50 GMT
Repository: flink
Updated Branches:
  refs/heads/master 5777bbc01 -> 35ad1972d


[FLINK-3591] Replace Quickstart K-Means Example by Streaming Example


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/35ad1972
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/35ad1972
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/35ad1972

Branch: refs/heads/master
Commit: 35ad1972d1604b4f69a9bfb12f00c280ad6262f8
Parents: 5777bbc
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Authored: Tue Mar 8 15:49:09 2016 +0100
Committer: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Committed: Tue Mar 8 17:15:23 2016 +0100

----------------------------------------------------------------------
 .../img/quickstart-example/jobmanager-job.png   | Bin 0 -> 520093 bytes
 .../quickstart-example/jobmanager-overview.png  | Bin 0 -> 355967 bytes
 .../jobmanager_kmeans_execute.png               | Bin 126912 -> 0 bytes
 .../jobmanager_kmeans_submit.png                | Bin 39526 -> 0 bytes
 docs/page/img/quickstart-example/kmeans003.png  | Bin 27962 -> 0 bytes
 docs/page/img/quickstart-example/kmeans008.png  | Bin 39305 -> 0 bytes
 docs/page/img/quickstart-example/kmeans015.png  | Bin 41958 -> 0 bytes
 docs/page/img/quickstart-example/result003.png  | Bin 60228 -> 0 bytes
 docs/page/img/quickstart-example/result008.png  | Bin 92732 -> 0 bytes
 docs/page/img/quickstart-example/result015.png  | Bin 89724 -> 0 bytes
 docs/quickstart/run_example_quickstart.md       | 469 ++++++++++++++-----
 11 files changed, 356 insertions(+), 113 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/jobmanager-job.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/jobmanager-job.png b/docs/page/img/quickstart-example/jobmanager-job.png
new file mode 100644
index 0000000..4a675bf
Binary files /dev/null and b/docs/page/img/quickstart-example/jobmanager-job.png differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/jobmanager-overview.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/jobmanager-overview.png b/docs/page/img/quickstart-example/jobmanager-overview.png
new file mode 100644
index 0000000..804861a
Binary files /dev/null and b/docs/page/img/quickstart-example/jobmanager-overview.png differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/jobmanager_kmeans_execute.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/jobmanager_kmeans_execute.png b/docs/page/img/quickstart-example/jobmanager_kmeans_execute.png
deleted file mode 100644
index 00d323f..0000000
Binary files a/docs/page/img/quickstart-example/jobmanager_kmeans_execute.png and /dev/null
differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/jobmanager_kmeans_submit.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/jobmanager_kmeans_submit.png b/docs/page/img/quickstart-example/jobmanager_kmeans_submit.png
deleted file mode 100644
index 974e03e..0000000
Binary files a/docs/page/img/quickstart-example/jobmanager_kmeans_submit.png and /dev/null
differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/kmeans003.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/kmeans003.png b/docs/page/img/quickstart-example/kmeans003.png
deleted file mode 100644
index 32f8dbb..0000000
Binary files a/docs/page/img/quickstart-example/kmeans003.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/kmeans008.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/kmeans008.png b/docs/page/img/quickstart-example/kmeans008.png
deleted file mode 100644
index b372fd1..0000000
Binary files a/docs/page/img/quickstart-example/kmeans008.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/kmeans015.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/kmeans015.png b/docs/page/img/quickstart-example/kmeans015.png
deleted file mode 100644
index 8b6fb51..0000000
Binary files a/docs/page/img/quickstart-example/kmeans015.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/result003.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/result003.png b/docs/page/img/quickstart-example/result003.png
deleted file mode 100644
index bdcef44..0000000
Binary files a/docs/page/img/quickstart-example/result003.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/result008.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/result008.png b/docs/page/img/quickstart-example/result008.png
deleted file mode 100644
index 921c73c..0000000
Binary files a/docs/page/img/quickstart-example/result008.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/page/img/quickstart-example/result015.png
----------------------------------------------------------------------
diff --git a/docs/page/img/quickstart-example/result015.png b/docs/page/img/quickstart-example/result015.png
deleted file mode 100644
index 9dbc6c4..0000000
Binary files a/docs/page/img/quickstart-example/result015.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/35ad1972/docs/quickstart/run_example_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/quickstart/run_example_quickstart.md b/docs/quickstart/run_example_quickstart.md
index 7e3d999..b4475b6 100644
--- a/docs/quickstart/run_example_quickstart.md
+++ b/docs/quickstart/run_example_quickstart.md
@@ -1,9 +1,9 @@
 ---
-title: "Quick Start: Run K-Means Example"
+title: "Quick Start: Monitoring the Wikipedia Edit Stream"
 # Top navigation
 top-nav-group: quickstart
 top-nav-pos: 2
-top-nav-title: Run Example
+top-nav-title: "Example: Wikipedia Edit Stream"
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -27,116 +27,359 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering))
on Flink. 
-On the way, you will see the a visualization of the program, the optimized execution plan,
and track the progress of its execution.
+In this guide we will start from scratch and go from setting up a Flink project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to
+read this channel in Flink and count the number of bytes that each user edits within
+a given window of time. This is easy enough to implement in a few minutes using Flink but
it will
+give you a good foundation from which to start building more complex analysis programs on
your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project stucture. Please
+see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more
details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+    -DarchetypeGroupId=org.apache.flink\
+    -DarchetypeArtifactId=flink-quickstart-java\
+    -DarchetypeVersion=1.0.0\
+    -DgroupId=wiki-edits\
+    -DartifactId=wiki-edits\
+    -Dversion=0.1\
+    -Dpackage=wikiedits\
+    -DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters,
+Maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+    └── main
+        ├── java
+        │   └── wikiedits
+        │       ├── Job.java
+        │       ├── SocketTextStreamWordCount.java
+        │       └── WordCount.java
+        └── resources
+            └── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added in the root directory
and
+several example Flink programs in `src/main/java`. We can delete the example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink Wikipedia connector as a dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks like this:
+
+{% highlight xml %}
+<dependencies>
+    <dependency>
+        <groupId>org.apache.flink</groupId>
+        <artifactId>flink-java</artifactId>
+        <version>${flink.version}</version>
+    </dependency>
+    <dependency>
+        <groupId>org.apache.flink</groupId>
+        <artifactId>flink-streaming-java_2.10</artifactId>
+        <version>${flink.version}</version>
+    </dependency>
+    <dependency>
+        <groupId>org.apache.flink</groupId>
+        <artifactId>flink-clients_2.10</artifactId>
+        <version>${flink.version}</version>
+    </dependency>
+    <dependency>
+        <groupId>org.apache.flink</groupId>
+        <artifactId>flink-connector-wikiedits_2.10</artifactId>
+        <version>${flink.version}</version>
+    </dependency>
+</dependencies>
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added. (This example and
+the Wikipedia connector were inspired by the *Hello Samza* example of Apache Samza.)
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor
and
+create the file `src/main/java/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+    public static void main(String[] args) throws Exception {
+
+    }
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give
+import statements here since IDEs can add them automatically. At the end of this section
I'll show
+the complete code with import statements if you simply want to skip ahead and enter that
in your
+editor.
+
+The first step in a Flink program is to create a `StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution
+parameters and create sources for reading from external systems. So let's go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
+{% endhighlight %}
+
+This creates a `DataStream` of `WikipediaEditEvent` elements that we can further process.
For
+the purposes of this example we are interested in determining the number of added or removed
+bytes that each user causes in a certain time window, let's say five seconds. For this we
first
+have to specify that we want to key the stream on the user name, that is to say that operations
+on this should take the key into account. In our case the summation of edited bytes in the
windows
+should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this:
+
+{% highlight java %}
+KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
+    .keyBy(new KeySelector<WikipediaEditEvent, String>() {
+        @Override
+        public String getKey(WikipediaEditEvent event) {
+            return event.getUser();
+        }
+    });
+{% endhighlight %}
+
+This gives us a Stream of `WikipediaEditEvent` that has a `String` key, the user name.
+We can now specify that we want to have windows imposed on this stream and compute a
+result based on elements in these windows. A window specifies a slice of a Stream
+on which to perform a computation. They are required when performing an aggregation
+computation on an infinite stream of elements. In our example we will say
+that we want to aggregate the sum of edited bytes for every five seconds:
+
+{% highlight java %}
+DataStream<Tuple2<String, Long>> result = keyedEdits
+    .timeWindow(Time.seconds(5))
+    .fold(new Tuple2<>("", 0L), new FoldFunction<WikipediaEditEvent, Tuple2<String,
Long>>() {
+        @Override
+        public Tuple2<String, Long> fold(Tuple2<String, Long> acc, WikipediaEditEvent
event) {
+            acc.f0 = event.getUser();
+            acc.f1 += event.getByteDiff();
+            return acc;
+        }
+    });
+{% endhighlight %}
+
+The first call, `.window()`, specifies that we want to have tumbling (non-overlapping) windows
+of five seconds. The second call specifies a *Fold transformation* on each window slice for
+each unique key. In our case we start from an initial value of `("", 0L)` and add to it the
byte
+difference of every edit in that time window for a user. The resulting Stream now contains
+a `Tuple2<String, Long>` for every user which gets emitted every five seconds.
+
+The only thing left to do is print the stream to the console and start execution:
+
+{% highlight java %}
+result.print();
+
+see.execute();
+{% endhighlight %}
+
+That last call is necessary to start the actual Flink job. All operations, such as creating
+sources, transformations and sinks only build up a graph of internal operations. Only when
+`execute()` is called is this graph of operations thrown on a cluster or executed on your
local
+machine.
+
+The complete code so far is this:
+
+{% highlight java %}
+package wikiedits;
+
+import org.apache.flink.api.common.functions.FoldFunction;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.KeyedStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.windowing.time.Time;
+import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditEvent;
+import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSource;
+
+public class WikipediaAnalysis {
+
+  public static void main(String[] args) throws Exception {
+
+    StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
+
+    DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
+
+    KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
+      .keyBy(new KeySelector<WikipediaEditEvent, String>() {
+        @Override
+        public String getKey(WikipediaEditEvent event) {
+          return event.getUser();
+        }
+      });
+
+    DataStream<Tuple2<String, Long>> result = keyedEdits
+      .timeWindow(Time.seconds(5))
+      .fold(new Tuple2<>("", 0L), new FoldFunction<WikipediaEditEvent, Tuple2<String,
Long>>() {
+        @Override
+        public Tuple2<String, Long> fold(Tuple2<String, Long> acc, WikipediaEditEvent
event) {
+          acc.f0 = event.getUser();
+          acc.f1 += event.getByteDiff();
+          return acc;
+        }
+      });
+
+    result.print();
+
+    see.execute();
+  }
+}
+{% endhighlight %}
+
+You can run this example in your IDE or on the command line, using Maven:
+
+{% highlight bash %}
+$ mvn clean package
+$ mvn exec:java -Dexec.mainClass=wikiedits.WikipediaAnalysis
+{% endhighlight %}
+
+The first command builds our project and the second executes our main class. The output should
be
+similar to this:
+
+{% highlight bash %}
+1> (Fenix down,114)
+6> (AnomieBOT,155)
+8> (BD2412bot,-3690)
+7> (IgnorantArmies,49)
+3> (Ckh3111,69)
+5> (Slade360,0)
+7> (Narutolovehinata5,2195)
+6> (Vuyisa2001,79)
+4> (Ms Sarah Welch,269)
+4> (KasparBot,-245)
+{% endhighlight %}
+
+The number in front of each line tells you on which parallel instance of the print sink the
output
+was produced.
+
+This should get you started with writing your own Flink programs. You can check out our guides
+about [basic concepts]{{{ site.baseurl }}/apis/common/index.html} and the
+[DataStream API]{{{ site.baseurl }}/apis/streaming/index.html} if you want to learn more.
Stick
+around for the bonus exercise if you want to learn about setting up a Flink cluster on
+your own machine and writing results to [Kafka](http://kafka.apache.org).
+
+## Bonus Exercise: Running on a Cluster and Writing to Kafka
+
+Please follow our [setup quickstart](setup_quickstart.html) for setting up a Flink distribution
+on your machine and refer to the [Kafka quickstart](https://kafka.apache.org/documentation.html#quickstart)
+for setting up a Kafka installation before we proceed.
+
+As a first step, we have to add the Flink Kafka connector as a dependency so that we can
+use the Kafka sink. Add this to the `pom.xml` file in the dependencies section:
+
+{% highlight xml %}
+<dependency>
+    <groupId>org.apache.flink</groupId>
+    <artifactId>flink-connector-kafka-0.8_2.10</artifactId>
+    <version>${flink.version}</version>
+</dependency>
+{% endhighlight %}
+
+Next, we need to modify our program. We'll remove the `print()` sink and instead use a
+Kafka sink. The new code looks like this:
+
+{% highlight java %}
+
+result
+    .map(new MapFunction<Tuple2<String,Long>, String>() {
+        @Override
+        public String map(Tuple2<String, Long> tuple) {
+            return tuple.toString();
+        }
+    })
+    .addSink(new FlinkKafkaProducer08<>("localhost:9092", "wiki-result", new SimpleStringSchema()));
+{% endhighlight %}
+
+Note how we first transform the Stream of `Tuple2<String, Long>` to a Stream of `String`
using
+a MapFunction. We are doing this because it is easier to write plain strings to Kafka. Then,
+we create a Kafka sink. You might have to adapt the hostname and port to your setup. `"wiki-result"`
+is the name of the Kafka stream that we are going to create next, before running our program.
+Build the project using Maven because we need the jar file for running on the cluster:
+
+{% highlight bash %}
+$ mvn clean package
+{% endhighlight %}
+
+The resulting jar file will be in the `target` subfolder: `target/wiki-edits-0.1.jar`. We'll
use
+this later.
+
+Now we are ready to launch a Flink cluster and run the program that writes to Kafka on it.
Go
+to the location where you installed Flink and start a local cluster:
+
+{% highlight bash %}
+$ cd my/flink/directory
+$ bin/start-local.sh
+{% endhighlight %}
+
+We also have to create the Kafka Topic, so that our program can write to it:
+
+{% highlight bash %}
+$ cd my/kafka/directory
+$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic wiki-results
+{% endhighlight %}
+
+Now we are ready to run our jar file on the local Flink cluster:
+{% highlight bash %}
+$ cd my/flink/directory
+$ bin/flink run -c wikiedits.WikipediaAnalysis path/to/wikiedits-0.1.jar
+{% endhighlight %}
+
+The output of that command should look similar to this, if everything went according to plan:
+
+```
+03/08/2016 15:09:27 Job execution switched to status RUNNING.
+03/08/2016 15:09:27 Source: Custom Source(1/1) switched to SCHEDULED
+03/08/2016 15:09:27 Source: Custom Source(1/1) switched to DEPLOYING
+03/08/2016 15:09:27 TriggerWindow(TumblingProcessingTimeWindows(5000), FoldingStateDescriptor{name=window-contents,
defaultValue=(,0), serializer=null}, ProcessingTimeTrigger(), WindowedStream.fold(WindowedStream.java:207))
-> Map -> Sink: Unnamed(1/1) switched to SCHEDULED
+03/08/2016 15:09:27 TriggerWindow(TumblingProcessingTimeWindows(5000), FoldingStateDescriptor{name=window-contents,
defaultValue=(,0), serializer=null}, ProcessingTimeTrigger(), WindowedStream.fold(WindowedStream.java:207))
-> Map -> Sink: Unnamed(1/1) switched to DEPLOYING
+03/08/2016 15:09:27 TriggerWindow(TumblingProcessingTimeWindows(5000), FoldingStateDescriptor{name=window-contents,
defaultValue=(,0), serializer=null}, ProcessingTimeTrigger(), WindowedStream.fold(WindowedStream.java:207))
-> Map -> Sink: Unnamed(1/1) switched to RUNNING
+03/08/2016 15:09:27 Source: Custom Source(1/1) switched to RUNNING
+```
+
+You can see how the individual operators start running. There are only two, because
+the operations after the window get folded into one operation for performance reasons. In
Flink
+we call this *chaining*.
+
+You can observe the output of the program by inspecting the Kafka topic using the Kafka
+console consumer:
+
+{% highlight bash %}
+bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wiki-result
+{% endhighlight %}
+
+You can also check out the Flink dashboard which should be running at [http://localhost:8081](http://localhost:8081).
+You get an overview of your cluster resources and running jobs:
+
+<a href="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-overview.png" ><img
class="img-responsive" src="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-overview.png"
alt="JobManager Overview"/></a>
 
-## Setup Flink
-Follow the [instructions](setup_quickstart.html) to setup Flink and enter the root directory
of your Flink setup.
-
-## Generate Input Data
-Flink contains a data generator for K-Means.
-
-~~~bash
-# Assuming you are in the root directory of your Flink setup
-mkdir kmeans
-cd kmeans
-# Run data generator
-java -cp ../examples/batch/KMeans.jar:../lib/flink-dist-{{ site.version }}.jar \
-  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator \
-  -points 500 -k 10 -stddev 0.08 -output `pwd`
-~~~
-
-The generator has the following arguments (arguments in `[]` are optional):
-
-~~~bash
--points <num> -k <num clusters> [-output <output-path>] [-stddev <relative
stddev>] [-range <centroid range>] [-seed <seed>]
-~~~
-
-The _relative standard deviation_ is an interesting tuning parameter. It determines the closeness
of the points to randomly generated centers.
-
-The `kmeans/` directory should now contain two files: `centers` and `points`. The `points`
file contains the points to cluster and the `centers` file contains initial cluster centers.
-
-
-## Inspect the Input Data
-Use the `plotPoints.py` tool to review the generated data points. [Download Python Script](plotPoints.py)
-
-~~~ bash
-python plotPoints.py points ./points input
-~~~ 
-
-Note: You might have to install [matplotlib](http://matplotlib.org/) (`python-matplotlib`
package on Ubuntu) to use the Python script.
-
-You can review the input data stored in the `input-plot.pdf`, for example with Evince (`evince
input-plot.pdf`).
-
-The following overview presents the impact of the different standard deviations on the input
data.
-
-|relative stddev = 0.03|relative stddev = 0.08|relative stddev = 0.15|
-|:--------------------:|:--------------------:|:--------------------:|
-|<img src="{{ site.baseurl }}/page/img/quickstart-example/kmeans003.png" alt="example1"
style="width: 275px;"/>|<img src="{{ site.baseurl }}/page/img/quickstart-example/kmeans008.png"
alt="example2" style="width: 275px;"/>|<img src="{{ site.baseurl }}/page/img/quickstart-example/kmeans015.png"
alt="example3" style="width: 275px;"/>|
-
-
-## Start Flink
-Start Flink and the web job submission client on your local machine.
-
-~~~ bash
-# return to the Flink root directory
-cd ..
-# start Flink
-./bin/start-local.sh
-~~~
-
-## Inspect and Run the K-Means Example Program
-The Flink web interface allows to submit Flink programs using a graphical user interface.
-
-<div class="row" style="padding-top:15px">
-	<div class="col-md-6">
-		<a data-lightbox="compiler" href="{{ site.baseurl }}/page/img/quickstart-example/jobmanager_kmeans_submit.png"
data-lightbox="example-1"><img class="img-responsive" src="{{ site.baseurl }}/page/img/quickstart-example/jobmanager_kmeans_submit.png"
/></a>
-	</div>
-	<div class="col-md-6">
-		1. Open web interface on <a href="http://localhost:8081">localhost:8081</a>
<br>
-		2. Select the "Submit new Job" page in the menu <br>
-		3. Upload the <code>KMeans.jar</code> from <code>examples/batch</code>
by clicking the "Add New" button, and then the "Upload" button. <br>
-		4. Select the <code>KMeans.jar</code> form the list of jobs <br>
-		5. Enter the arguments and options in the lower box: <br>
-		    Leave the <i>Entry Class</i> and <i>Parallelism</i> form empty<br>
-		    Enter the following program arguments: <br>
-		    (KMeans expects the following args: <code>--points &lt;path&gt; --centroids
&lt;path&gt; --output &lt;path&gt; --iterations &lt;n&gt;</code>
-			{% highlight bash %}--points /tmp/kmeans/points --centroids /tmp/kmeans/centers --output
/tmp/kmeans/result --iterations 10{% endhighlight %}<br>
-		6. Press <b>Submit</b> to start the job
-	</div>
-</div>
-<hr>
-<div class="row" style="padding-top:15px">
-	<div class="col-md-6">
-		<a data-lightbox="compiler" href="{{ site.baseurl }}/page/img/quickstart-example/jobmanager_kmeans_execute.png"
data-lightbox="example-1"><img class="img-responsive" src="{{ site.baseurl }}/page/img/quickstart-example/jobmanager_kmeans_execute.png"
/></a>
-	</div>
-
-	<div class="col-md-6">
-		Watch the job executing.
-	</div>
-</div>
-
-
-## Shutdown Flink
-Stop Flink when you are done.
-
-~~~ bash
-# stop Flink
-./bin/stop-local.sh
-~~~
-
-## Analyze the Result
-Use the [Python Script](plotPoints.py) again to visualize the result.
-
-~~~bash
-cd kmeans
-python plotPoints.py result ./result clusters
-~~~
-
-The following three pictures show the results for the sample input above. Play around with
the parameters (number of iterations, number of clusters) to see how they affect the result.
-
-
-|relative stddev = 0.03|relative stddev = 0.08|relative stddev = 0.15|
-|:--------------------:|:--------------------:|:--------------------:|
-|<img src="{{ site.baseurl }}/page/img/quickstart-example/result003.png" alt="example1"
style="width: 275px;"/>|<img src="{{ site.baseurl }}/page/img/quickstart-example/result008.png"
alt="example2" style="width: 275px;"/>|<img src="{{ site.baseurl }}/page/img/quickstart-example/result015.png"
alt="example3" style="width: 275px;"/>|
+If you click on your running job you will get a view where you can inspect individual operations
+and, for example, see the number of processed elements:
+
+<a href="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-job.png" ><img
class="img-responsive" src="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-job.png"
alt="Example Job View"/></a>
 
+This concludes our little tour of Flink. If you have any questions, please don't hesitate
to ask on our [Mailing Lists](http://flink.apache.org/community.html#mailing-lists).
\ No newline at end of file


Mime
View raw message