flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject [2/7] [FLINK-962] Initial import of documentation from website into source code (closes #34)
Date Mon, 23 Jun 2014 12:52:21 GMT
http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/java_api_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/java_api_quickstart.md b/docs/java_api_quickstart.md
new file mode 100644
index 0000000..75f4c7c
--- /dev/null
+++ b/docs/java_api_quickstart.md
@@ -0,0 +1,126 @@
+---
+title: "Quickstart: Java API"
+---
+
+<p class="lead">Start working on your Stratosphere Java program in a few simple steps.</p>
+
+<section id="requirements">
+  <div class="page-header"><h2>Requirements</h2></div>
+  <p class="lead">The only requirements are working <strong>Maven 3.0.4</strong> (or higher) and <strong>Java 6.x</strong> (or higher) installations.</p>
+</section>
+
+<section id="create_project">
+  <div class="page-header"><h2>Create Project</h2></div>
+
+  <p class="lead">Use one of the following commands to <strong>create a project</strong>:</p>
+
+  <ul class="nav nav-tabs" style="border-bottom: none;">
+      <li class="active"><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li>
+      <li><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li>
+  </ul>
+  <div class="tab-content">
+      <div class="tab-pane active" id="quickstart-script">
+{% highlight bash %}
+$ curl https://raw.githubusercontent.com/stratosphere/stratosphere-quickstart/master/quickstart.sh | bash
+{% endhighlight %}
+      </div>
+      <div class="tab-pane" id="maven-archetype">
+{% highlight bash %}
+$ mvn archetype:generate                             \
+    -DarchetypeGroupId=eu.stratosphere               \
+    -DarchetypeArtifactId=quickstart-java            \
+    -DarchetypeVersion={{site.current_stable}}
+{% endhighlight %}
+      This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name.
+      </div>
+  </div>
+</section>
+
+<section id="inspect_project">
+  <div class="page-header"><h2>Inspect Project</h2></div>
+  <p class="lead">There will be a <strong>new directory in your working directory</strong>. If you've used the <em>curl</em> approach, the directory is called <code>quickstart</code>. Otherwise, it has the name of your artifactId.</p>
+  <p class="lead">The sample project is a <strong>Maven project</strong>, which contains two classes. <em>Job</em> is a basic skeleton program and <em>WordCountJob</em> a working example. Please note that the <em>main</em> method of both classes allow you to start Stratosphere in a development/testing mode.</p>
+  <p class="lead">We recommend to <strong>import this project into your IDE</strong> to develop and test it. If you use Eclipse, the <a href="http://www.eclipse.org/m2e/">m2e plugin</a> allows to <a href="http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import">import Maven projects</a>. Some Eclipse bundles include that plugin by default, other require you to install it manually. The IntelliJ IDE also supports Maven projects out of the box.</p>
+</section>
+
+<section id="build_project">
+<div class="alert alert-danger">A note to Mac OS X users: The default JVM heapsize for Java is too small for Stratosphere. You have to manually increase it. Choose "Run Configurations" -> Arguments and write into the "VM Arguments" box: "-Xmx800m" in Eclipse.</div>
+  <div class="page-header"><h2>Build Project</h2></div>
+  <p class="lead">If you want to <strong>build your project</strong>, go to your project directory and issue the <code>mvn clean package</code> command. You will <strong>find a jar</strong> that runs on every Stratosphere cluster in <code>target/stratosphere-project-0.1-SNAPSHOT.jar</code>.</p>
+</section>
+
+<section id="next_steps">
+  <div class="page-header"><h2>Next Steps</h2></div>
+  <p class="lead"><strong>Write your application!</strong></p>
+  <p>The quickstart project contains a WordCount implementation, the "Hello World" of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms "the" or "house" occurs in all Wikipedia texts.</p>
+ <br>
+<b>Sample Input:</b> <br>
+{% highlight bash %}
+big data is big
+{% endhighlight %}
+<b>Sample Output:</b> <br>
+{% highlight bash %}
+big 2
+data 1
+is 1
+{% endhighlight %}
+
+<p>The following code shows the WordCount implementation from the Quickstart which processes some text lines with two operators (FlatMap and Reduce), and writes the prints the resulting words and counts to std-out.</p>
+
+{% highlight java %}
+public class WordCount {
+  
+  public static void main(String[] args) throws Exception {
+    
+    // set up the execution environment
+    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+    
+    // get input data
+    DataSet<String> text = env.fromElements(
+        "To be, or not to be,--that is the question:--",
+        "Whether 'tis nobler in the mind to suffer",
+        "The slings and arrows of outrageous fortune",
+        "Or to take arms against a sea of troubles,"
+        );
+    
+    DataSet<Tuple2<String, Integer>> counts = 
+        // split up the lines in pairs (2-tuples) containing: (word,1)
+        text.flatMap(new LineSplitter())
+        // group by the tuple field "0" and sum up tuple field "1"
+        .groupBy(0)
+        .aggregate(Aggregations.SUM, 1);
+
+    // emit result
+    counts.print();
+    
+    // execute program
+    env.execute("WordCount Example");
+  }
+}
+{% endhighlight %}
+
+<p>The operations are defined by specialized classes, here the LineSplitter class.</p>
+
+{% highlight java %}
+public class LineSplitter extends FlatMapFunction<String, Tuple2<String, Integer>> {
+
+  @Override
+  public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
+    // normalize and split the line into words
+    String[] tokens = value.toLowerCase().split("\\W+");
+    
+    // emit the pairs
+    for (String token : tokens) {
+      if (token.length() > 0) {
+        out.collect(new Tuple2<String, Integer>(token, 1));
+      }
+    }
+  }
+}
+
+{% endhighlight %}
+
+<p><a href="https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCount.java">Check GitHub</a> for the full example code.</p>
+
+<p class="lead">For a complete overview over our Java API, have a look at the <a href="{{ site.baseurl }}/docs/{{site.current_stable_documentation}}/programming_guides/java.html">Stratosphere Documentation</a> and <a href="{{ site.baseurl }}/docs/{{site.current_stable_documentation}}/programming_guides/examples_java.html">further example programs</a>. If you have any trouble, ask on our <a href="https://groups.google.com/forum/#!forum/stratosphere-dev">Mailing list</a>. We are happy to provide help.</p>
+</section>

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/local_execution.md
----------------------------------------------------------------------
diff --git a/docs/local_execution.md b/docs/local_execution.md
new file mode 100644
index 0000000..cd60f62
--- /dev/null
+++ b/docs/local_execution.md
@@ -0,0 +1,106 @@
+---
+title:  "Local Execution"
+---
+
+# Local Execution/Debugging
+
+Stratosphere can run on a single machine, even in a single Java Virtual Machine. This allows users to test and debug Stratosphere programs locally. This section gives an overview of the local execution mechanisms.
+
+**NOTE:** Please also refer to the [debugging section]({{site.baseurl}}/docs/0.5/programming_guides/java.html#debugging) in the Java API documentation for a guide to testing and local debugging utilities in the Java API.
+
+The local environments and executors allow you to run Stratosphere programs in local Java Virtual Machine, or with within any JVM as part of existing programs. Most examples can be launched locally by simply hitting the "Run" button of your IDE.
+
+If you are running Stratosphere programs locally, you can also debug your program like any other Java program. You can either use `System.out.println()` to write out some internal variables or you can use the debugger. It is possible to set breakpoints within `map()`, `reduce()` and all the other methods.
+
+The `JobExecutionResult` object, which is returned after the execution finished, contains the program runtime and the accumulator results.
+
+*Note:* The local execution environments do not start any web frontend to monitor the execution.
+
+
+# Maven Dependency
+
+If you are developing your program in a Maven project, you have to add the `stratosphere-clients` module using this dependency:
+
+```xml
+<dependency>
+  <groupId>eu.stratosphere</groupId>
+  <artifactId>stratosphere-clients</artifactId>
+  <version>{{site.current_stable}}</version>
+</dependency>
+```
+
+# Local Environment
+
+The `LocalEnvironment` is a handle to local execution for Stratosphere programs. Use it to run a program within a local JVM - standalone or embedded in other programs.
+
+The local environment is instantiated via the method `ExecutionEnvironment.createLocalEnvironment()`. By default, it will use as many local threads for execution as your machine has CPU cores (hardware contexts). You can alternatively specify the desired parallelism. The local environment can be configured to log to the console using `enableLogging()`/`disableLogging()`.
+
+In most cases, calling `ExecutionEnvironment.getExecutionEnvironment()` is the even better way to go. That method returns a `LocalEnvironment` when the program is started locally (outside the command line interface), and it returns a pre-configured environment for cluster execution, when the program is invoked by the [command line interface]({{ site.baseurl }}/docs/0.5/program_execution/cli_client.html).
+
+```java
+public static void main(String[] args) throws Exception {
+    ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
+
+    DataSet<String> data = env.readTextFile("file:///path/to/file");
+
+    data
+        .filter(new FilterFunction<String>() {
+            public boolean filter(String value) {
+                return value.startsWith("http://");
+            }
+        })
+        .writeAsText("file:///path/to/result");
+
+    env.execute();
+}
+```
+
+
+# Local Executor
+
+The *LocalExecutor* is similar to the local environment, but it takes a *Plan* object, which describes the program as a single executable unit. The *LocalExecutor* is typically used with the Scala API. 
+
+The following code shows how you would use the `LocalExecutor` with the Wordcount example for Scala Programs:
+
+```scala
+public static void main(String[] args) throws Exception {
+    val input = TextFile("hdfs://path/to/file")
+
+    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } }
+    val counts = words groupBy { x => x } count()
+
+    val output = counts.write(wordsOutput, CsvOutputFormat())
+  
+    val plan = new ScalaPlan(Seq(output), "Word Count")
+    LocalExecutor.executePlan(p);
+}
+```
+
+
+# LocalDistributedExecutor
+
+Stratosphere also offers a `LocalDistributedExecutor` which starts multiple TaskManagers within one JVM. The standard `LocalExecutor` starts one JobManager and one TaskManager in one JVM.
+With the `LocalDistributedExecutor` you can define the number of TaskManagers to start. This is useful for debugging network related code and more of a developer tool than a user tool.
+
+```java
+public static void main(String[] args) throws Exception {
+    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
+
+    data
+        .filter(new FilterFunction<String>() {
+            public boolean filter(String value) {
+                return value.startsWith("http://");
+            }
+        })
+        .writeAsText("hdfs://path/to/result");
+
+    Plan p = env.createProgramPlan();
+    LocalDistributedExecutor lde = new LocalDistributedExecutor();
+    lde.startNephele(2); // start two TaskManagers
+    lde.run(p);
+}
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/local_setup.md
----------------------------------------------------------------------
diff --git a/docs/local_setup.md b/docs/local_setup.md
new file mode 100644
index 0000000..b49118a
--- /dev/null
+++ b/docs/local_setup.md
@@ -0,0 +1,108 @@
+---
+title:  "Local Setup"
+---
+
+This documentation is intended to provide instructions on how to run Stratosphere locally on a single machine.
+
+# Download
+
+Go to the [downloads page]({{site.baseurl}}/downloads/) and get the ready to run package. If you want to interact with Hadoop (e.g. HDFS or HBase), make sure to pick the Stratosphere package **matching your Hadoop version**. When in doubt or you plan to just work with the local file system pick the package for Hadoop 1.2.x.
+
+# Requirements
+
+Stratosphere runs on **Linux**, **Mac OS X** and **Windows**. The only requirement for a local setup is **Java 1.6.x** or higher. The following manual assumes a *UNIX-like environment*, for Windows see [Stratosphere on Windows](#windows).
+
+You can check the correct installation of Java by issuing the following command:
+
+```bash
+java -version
+```
+
+The command should output something comparable to the following:
+
+```bash
+java version "1.6.0_22"
+Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
+Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
+```
+
+# Configuration
+
+**For local mode Stratosphere is ready to go out of the box and you don't need to change the default configuration.**
+
+The out of the box configuration will use your default Java installation. You can manually set the environment variable `JAVA_HOME` or the configuration key `env.java.home` in `conf/stratosphere-conf.yaml` if you want to manually override the Java runtime to use. Consult the [configuration page]({{site.baseurl}}/docs/0.4/setup/config.html) for further details about configuring Stratosphere.
+
+# Starting Stratosphere
+
+**You are now ready to start Stratosphere.** Unpack the downloaded archive and change to the newly created `stratosphere` directory. There you can start Stratosphere in local mode:
+
+```bash
+$ tar xzf stratosphere-*.tgz
+$ cd stratosphere
+$ bin/start-local.sh
+Starting job manager
+```
+
+You can check that the system is running by checking the log files in the `logs` directory:
+
+```bash
+$ tail log/stratosphere-*-jobmanager-*.log
+INFO ... - Initializing memory manager with 409 megabytes of memory
+INFO ... - Trying to load eu.stratosphere.nephele.jobmanager.scheduler.local.LocalScheduler as scheduler
+INFO ... - Setting up web info server, using web-root directory ...
+INFO ... - Web info server will display information about nephele job-manager on localhost, port 8081.
+INFO ... - Starting web info server for JobManager on port 8081
+```
+
+The JobManager will also start a web frontend on port 8081, which you can check with your browser at `http://localhost:8081`.
+
+# Stratosphere on Windows
+
+If you want to run Stratosphere on Windows you need to download, unpack and configure the Stratosphere archive as mentioned above. After that you can either use the **Windows Batch** file (`.bat`) or use **Cygwin**  to run the Stratosphere Jobmanager.
+
+To start Stratosphere in local mode from the *Windows Batch*, open the command window, navigate to the `bin/` directory of Stratosphere and run `start-local.bat`.
+
+```bash
+$ cd stratosphere
+$ cd bin
+$ start-local.bat
+Starting Stratosphere job manager. Webinterface by default on http://localhost:8081/.
+Do not close this batch window. Stop job manager by pressing Ctrl+C.
+```
+
+After that, you need to open a second terminal to run jobs using `stratosphere.bat`.
+
+With *Cygwin* you need to start the Cygwin Terminal, navigate to your Stratosphere directory and run the `start-local.sh` script:
+
+```bash
+$ cd stratosphere
+$ bin/start-local.sh
+Starting Nephele job manager
+```
+
+If you are installing Stratosphere from the git repository and you are using the Windows git shell, Cygwin can produce a failure similiar to this one:
+
+```bash
+c:/stratosphere/bin/start-local.sh: line 30: $'\r': command not found
+```
+
+This error occurs, because git is automatically transforming UNIX line endings to Windows style line endings when running in Windows. The problem is, that Cygwin can only deal with UNIX style line endings. The solution is to adjust the Cygwin settings to deal with the correct line endings by following these three steps:
+
+1. Start a Cygwin shell.
+
+2. Determine your home directory by entering
+
+```bash
+cd; pwd
+```
+
+It will return a path under the Cygwin root path.
+
+2.  Using NotePad, WordPad or a different text editor open the file `.bash_profile` in the home directory and append the following: (If the file does not exist you have to create it)
+
+```bash
+export SHELLOPTS
+set -o igncr
+```
+
+Save the file and open a new bash shell.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/quickstart/plotPoints.py
----------------------------------------------------------------------
diff --git a/docs/quickstart/plotPoints.py b/docs/quickstart/plotPoints.py
new file mode 100755
index 0000000..fa04d31
--- /dev/null
+++ b/docs/quickstart/plotPoints.py
@@ -0,0 +1,82 @@
+#!/usr/bin/python
+import sys
+import matplotlib.pyplot as plt
+import csv
+import os
+
+if len(sys.argv) < 4 or not sys.argv[1] in ['points', 'result']:
+  print "Usage: plot-clusters.py (points|result) <src-file> <pdf-file-prefix>"
+  sys.exit(1)
+
+inFile = sys.argv[1]
+inFile = sys.argv[2]
+outFilePx = sys.argv[3]
+
+inFileName = os.path.splitext(os.path.basename(inFile))[0]
+outFile = os.path.join(".", outFilePx+"-plot.pdf")
+
+########### READ DATA
+
+cs = []
+xs = []
+ys = []
+
+minX = None
+maxX = None
+minY = None
+maxY = None
+
+if sys.argv[1] == 'points':
+
+  with open(inFile, 'rb') as file:
+    for line in file:
+      # parse data
+      csvData = line.strip().split(' ')
+
+      x = float(csvData[0])
+      y = float(csvData[1])
+
+      if not minX or minX > x:
+        minX = x
+      if not maxX or maxX < x:
+        maxX = x
+      if not minY or minY > y:
+        minY = y
+      if not maxY or maxY < y:
+        maxY = y
+
+      xs.append(x)
+      ys.append(y)
+
+    # plot data
+    plt.clf()
+    plt.scatter(xs, ys, s=25, c="#999999", edgecolors='None', alpha=1.0)
+    plt.ylim([minY,maxY])
+    plt.xlim([minX,maxX])
+
+elif sys.argv[1] == 'result':
+
+  with open(inFile, 'rb') as file:
+    for line in file:
+      # parse data
+      csvData = line.strip().split(' ')
+
+      c = int(csvData[0])
+      x = float(csvData[1])
+      y = float(csvData[2])
+
+      cs.append(c)
+      xs.append(x)
+      ys.append(y)
+
+    # plot data
+    plt.clf()
+    plt.scatter(xs, ys, s=25, c=cs, edgecolors='None', alpha=1.0)
+    plt.ylim([minY,maxY])
+    plt.xlim([minX,maxX])
+
+
+plt.savefig(outFile, dpi=600)
+print "\nPlotted file: %s" % outFile
+
+sys.exit(0)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/run_example_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/run_example_quickstart.md b/docs/run_example_quickstart.md
new file mode 100644
index 0000000..600e3fd
--- /dev/null
+++ b/docs/run_example_quickstart.md
@@ -0,0 +1,154 @@
+---
+title: "Quick Start: Run K-Means Example"
+---
+
+
+<p class="lead">
+	This guide will demonstrate Stratosphere's features by example. You will see how you can leverage Stratosphere's Iteration-feature to find clusters in a dataset using <a href="http://en.wikipedia.org/wiki/K-means_clustering">K-Means clustering</a>. 
+	On the way, you will see the compiler, the status interface and the result of the algorithm.
+</p>
+
+
+<section id="data">
+  <div class="page-header">
+  	<h2>Generate Input Data</h2>
+  </div>
+  <p>Stratosphere contains a data generator for K-Means.</p>
+  {% highlight bash %}
+# Download Stratosphere
+wget {{ site.current_stable_dl }}
+tar xzf stratosphere-*.tgz 
+cd stratosphere-*
+mkdir kmeans
+cd kmeans
+# run data generator
+java -cp  ../examples/stratosphere-java-examples-{{ site.current_stable }}-KMeans.jar eu.stratosphere.example.java.clustering.util.KMeansDataGenerator 500 10 0.08
+cp /tmp/points .
+cp /tmp/centers .
+  {% endhighlight %}
+The generator has the following arguments:
+{% highlight bash %}
+KMeansDataGenerator <numberOfDataPoints> <numberOfClusterCenters> [<relative stddev>] [<centroid range>] [<seed>]
+{% endhighlight %}
+The <i>relative standard deviation</i> is an interesting tuning parameter: it determines the closeness of the points to the centers.
+<p>The <code>kmeans/</code> directory should now contain two files: <code>centers</code> and <code>points</code>.</p>
+
+
+<h2>Review Input Data</h2>
+Use the <code>plotPoints.py</code> tool to review the result of the data generator. <a href="{{site.baseurl}}/quickstart/example-data/plotPoints.py">Download Python Script</a>
+{% highlight bash %}
+python2.7 plotPoints.py points points input
+{% endhighlight %}
+
+
+Note: You might have to install <a href="http://matplotlib.org/">matplotlib</a> (<code>python-matplotlib</code> package on Ubuntu) to use the Python script.
+
+
+The following overview presents the impact of the different standard deviations on the input data.
+<div class="row" style="padding-top:15px">
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;">relative stddev = 0.03</div>
+		<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans003.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans003.png" /></a>
+	</div>
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;padding-bottom:2px">relative stddev = 0.08</div>
+		<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans008.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans008.png" /></a>
+	</div>
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;">relative stddev = 0.15</div>
+		<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans015.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans015.png" /></a>
+	</div>
+</div>
+</section>
+
+<section id="run">
+ <div class="page-header">
+  	<h2>Run Clustering</h2>
+  </div>
+We are using the generated input data to run the clustering using a Stratosphere job.
+{% highlight bash %}
+# go to the Stratosphere-root directory
+cd stratosphere
+# start Stratosphere (use ./bin/start-cluster.sh if you're on a cluster)
+./bin/start-local.sh
+# Start Stratosphere web client
+./bin/start-webclient.sh
+{% endhighlight %}
+
+<h2>Review Stratosphere Compiler</h2>
+
+The Stratosphere webclient allows to submit Stratosphere programs using a graphical user interface.
+
+<div class="row" style="padding-top:15px">
+	<div class="col-md-6">
+		<a data-lightbox="compiler" href="{{site.baseurl}}/img/quickstart-example/run-webclient.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/run-webclient.png" /></a>
+	</div>
+	<div class="col-md-6">
+		1. <a href="http://localhost:8080/launch.html">Open webclient on localhost:8080</a> <br>
+		2. Upload the 
+{% highlight bash %}
+examples/stratosphere-java-examples-0.5-SNAPSHOT-KMeansIterative.jar
+{% endhighlight %} file.<br>
+		3. Select it in the left box to see how the operators in the plan are connected to each other. <br>
+		4. Enter the arguments in the lower left box:
+{% highlight bash %}
+file://<pathToGenerated>points file://<pathToGenerated>centers file://<pathToGenerated>result 10
+{% endhighlight %}
+For example:
+{% highlight bash %}
+file:///tmp/stratosphere/kmeans/points file:///tmp/stratosphere/kmeans/centers file:///tmp/stratosphere/kmeans/result 20
+{% endhighlight %}
+	</div>
+</div>
+<hr>
+<div class="row" style="padding-top:15px">
+	<div class="col-md-6">
+		<a data-lightbox="compiler" href="{{site.baseurl}}/img/quickstart-example/compiler-webclient-new.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/compiler-webclient-new.png" /></a>
+	</div>
+
+	<div class="col-md-6">
+		1. Press the <b>RunJob</b> to see the optimzer plan. <br>
+		2. Inspect the operators and see the properties (input sizes, cost estimation) determined by the optimizer.
+	</div>
+</div>
+<hr>
+<div class="row" style="padding-top:15px">
+	<div class="col-md-6">
+		<a data-lightbox="compiler" href="{{site.baseurl}}/img/quickstart-example/jobmanager-running-new.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/jobmanager-running-new.png" /></a>
+	</div>
+	<div class="col-md-6">
+		1. Press the <b>Continue</b> button to start executing the job. <br>
+		2. <a href="http://localhost:8080/launch.html">Open Stratosphere's monitoring interface</a> to see the job's progress.<br>
+		3. Once the job has finished, you can analyize the runtime of the individual operators.
+	</div>
+</div>
+</section>
+
+<section id="result">
+ <div class="page-header">
+  	<h2>Analyze the Result</h2>
+  </div>
+Use the <a href="{{site.baseurl}}/quickstart/example-data/plotPoints.py">Python Script</a> again to visualize the result
+
+{% highlight bash %}
+python2.7 plotPoints.py result result result-pdf
+{% endhighlight %}
+
+The following three pictures show the results for the sample input above. Play around with the parameters (number of iterations, number of clusters) to see how they affect the result.
+
+<div class="row" style="padding-top:15px">
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;">relative stddev = 0.03</div>
+		<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result003.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result003.png" /></a>
+	</div>
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;">relative stddev = 0.08</div>
+		<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result008.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result008.png" /></a>
+	</div>
+	<div class="col-md-4">
+		<div class="text-center" style="font-weight:bold;">relative stddev = 0.15</div>
+		<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result015.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result015.png" /></a>
+	</div>
+</div>
+
+</section>

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/scala_api_examples.md
----------------------------------------------------------------------
diff --git a/docs/scala_api_examples.md b/docs/scala_api_examples.md
new file mode 100644
index 0000000..ac930b3
--- /dev/null
+++ b/docs/scala_api_examples.md
@@ -0,0 +1,195 @@
+---
+title:  "Scala API Examples"
+---
+
+The following example programs showcase different applications of Stratosphere from simple word counting to graph algorithms.
+The code samples illustrate the use of **[Stratosphere's Scala API]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/scala.html)**. 
+
+The full source code of the following and more examples can be found in the **[stratosphere-scala-examples](https://github.com/stratosphere/stratosphere/tree/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples)** module.
+
+# Word Count
+
+WordCount is the "Hello World" of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted.
+
+```scala
+// read input data
+val input = TextFile(textInput)
+
+// tokenize words
+val words = input.flatMap { _.split(" ") map { (_, 1) } }
+
+// count by word
+val counts = words.groupBy { case (word, _) => word }
+  .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }
+
+val output = counts.write(wordsOutput, CsvOutputFormat()))
+```
+
+The [WordCount example](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/wordcount/WordCount.scala) implements the above described algorithm with input parameters: `<degree of parallelism>, <text input path>, <output path>`. As test data, any text file will do.
+
+# Page Rank
+
+The PageRank algorithm computes the "importance" of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries.
+
+In this simple example, PageRank is implemented with a [bulk iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations) and a fixed number of iterations.
+
+```scala
+// cases classes so we have named fields
+case class PageWithRank(pageId: Long, rank: Double)
+case class Edge(from: Long, to: Long, transitionProbability: Double)
+
+// constants for the page rank formula
+val dampening = 0.85
+val randomJump = (1.0 - dampening) / NUM_VERTICES
+val initialRank = 1.0 / NUM_VERTICES
+  
+// read inputs
+val pages = DataSource(verticesPath, CsvInputFormat[Long]())
+val edges = DataSource(edgesPath, CsvInputFormat[Edge]())
+
+// assign initial rank
+val pagesWithRank = pages map { p => PageWithRank(p, initialRank) }
+
+// the iterative computation
+def computeRank(ranks: DataSet[PageWithRank]) = {
+
+    // send rank to neighbors
+    val ranksForNeighbors = ranks join edges
+        where { _.pageId } isEqualTo { _.from }
+        map { (p, e) => (e.to, p.rank * e.transitionProbability) }
+    
+    // gather ranks per vertex and apply page rank formula
+    ranksForNeighbors .groupBy { case (node, rank) => node }
+                      .reduce { (a, b) => (a._1, a._2 + b._2) }
+                      .map {case (node, rank) => PageWithRank(node, rank * dampening + randomJump) }
+}
+
+// invoke iteratively
+val finalRanks = pagesWithRank.iterate(numIterations, computeRank)
+val output = finalRanks.write(outputPath, CsvOutputFormat())
+```
+
+
+
+The [PageRank program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/PageRank.scala) implements the above example.
+It requires the following parameters to run: `<pages input path>, <link input path>, <output path>, <num pages>, <num iterations>`.
+
+Input files are plain text files and must be formatted as follows:
+- Pages represented as an (long) ID separated by new-line characters.
+    * For example `"1\n2\n12\n42\n63\n"` gives five pages with IDs 1, 2, 12, 42, and 63.
+- Links are represented as pairs of page IDs which are separated by space characters. Links are separated by new-line characters:
+    * For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (directed) links (1)->(2), (2)->(12), (1)->(12), and (42)->(63).
+
+For this simple implementation it is required that each page has at least one incoming and one outgoing link (a page can point to itself).
+
+# Connected Components
+
+The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID.
+
+This implementation uses a [delta iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations): Vertices that have not changed their component id do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices.
+
+```scala
+// define case classes
+case class VertexWithComponent(vertex: Long, componentId: Long)
+case class Edge(from: Long, to: Long)
+
+// get input data
+val vertices = DataSource(verticesPath, CsvInputFormat[Long]())
+val directedEdges = DataSource(edgesPath, CsvInputFormat[Edge]())
+
+// assign each vertex its own ID as component ID
+val initialComponents = vertices map { v => VertexWithComponent(v, v) }
+val undirectedEdges = directedEdges flatMap { e => Seq(e, Edge(e.to, e.from)) }
+
+def propagateComponent(s: DataSet[VertexWithComponent], ws: DataSet[VertexWithComponent]) = {
+  val allNeighbors = ws join undirectedEdges
+        where { _.vertex } isEqualTo { _.from }
+        map { (v, e) => VertexWithComponent(e.to, v.componentId ) }
+    
+    val minNeighbors = allNeighbors groupBy { _.vertex } reduceGroup { cs => cs minBy { _.componentId } }
+
+    // updated solution elements == new workset
+    val s1 = s join minNeighbors
+        where { _.vertex } isEqualTo { _.vertex }
+        flatMap { (curr, candidate) =>
+            if (candidate.componentId < curr.componentId) Some(candidate) else None
+        }
+
+  (s1, s1)
+}
+
+val components = initialComponents.iterateWithDelta(initialComponents, { _.vertex }, propagateComponent,
+                    maxIterations)
+val output = components.write(componentsOutput, CsvOutputFormat())
+```
+
+The [ConnectedComponents program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/ConnectedComponents.scala) implements the above example. It requires the following parameters to run: `<vertex input path>, <edge input path>, <output path> <max num iterations>`.
+
+Input files are plain text files and must be formatted as follows:
+- Vertices represented as IDs and separated by new-line characters.
+    * For example `"1\n2\n12\n42\n63\n"` gives five vertices with (1), (2), (12), (42), and (63).
+- Edges are represented as pairs for vertex IDs which are separated by space characters. Edges are separated by new-line characters:
+    * For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (undirected) links (1)-(2), (2)-(12), (1)-(12), and (42)-(63).
+
+# Relational Query
+
+The Relational Query example assumes two tables, one with `orders` and the other with `lineitems` as specified by the [TPC-H decision support benchmark](http://www.tpc.org/tpch/). TPC-H is a standard benchmark in the database industry. See below for instructions how to generate the input data.
+
+The example implements the following SQL query.
+
+```sql
+SELECT l_orderkey, o_shippriority, sum(l_extendedprice) as revenue
+    FROM orders, lineitem
+WHERE l_orderkey = o_orderkey
+    AND o_orderstatus = "F" 
+    AND YEAR(o_orderdate) > 1993
+    AND o_orderpriority LIKE "5%"
+GROUP BY l_orderkey, o_shippriority;
+```
+
+The Stratosphere Scala program, which implements the above query looks as follows.
+
+```scala
+// --- define some custom classes to address fields by name ---
+case class Order(orderId: Int, status: Char, date: String, orderPriority: String, shipPriority: Int)
+case class LineItem(orderId: Int, extendedPrice: Double)
+case class PrioritizedOrder(orderId: Int, shipPriority: Int, revenue: Double)
+
+val orders = DataSource(ordersInputPath, DelimitedInputFormat(parseOrder))
+val lineItem2600s = DataSource(lineItemsInput, DelimitedInputFormat(parseLineItem))
+
+val filteredOrders = orders filter { o => o.status == "F" && o.date.substring(0, 4).toInt > 1993 && o.orderPriority.startsWith("5") }
+
+val prioritizedItems = filteredOrders join lineItems
+    where { _.orderId } isEqualTo { _.orderId } // join on the orderIds
+    map { (o, li) => PrioritizedOrder(o.orderId, o.shipPriority, li.extendedPrice) }
+
+val prioritizedOrders = prioritizedItems
+    groupBy { pi => (pi.orderId, pi.shipPriority) } 
+    reduce { (po1, po2) => po1.copy(revenue = po1.revenue + po2.revenue) }
+
+val output = prioritizedOrders.write(ordersOutput, CsvOutputFormat(formatOutput))
+```
+
+The [Relational Query program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/relational/RelationalQuery.scala) implements the above query. It requires the following parameters to run: `<orders input path>, <lineitem input path>, <output path>, <degree of parallelism>`.
+
+The orders and lineitem files can be generated using the [TPC-H benchmark](http://www.tpc.org/tpch/) suite's data generator tool (DBGEN). 
+Take the following steps to generate arbitrary large input files for the provided Stratosphere programs:
+
+1.  Download and unpack DBGEN
+2.  Make a copy of *makefile.suite* called *Makefile* and perform the following changes:
+
+```bash
+DATABASE = DB2
+MACHINE  = LINUX
+WORKLOAD = TPCH
+CC       = gcc
+```
+
+1.  Build DBGEN using *make*
+2.  Generate lineitem and orders relations using dbgen. A scale factor
+    (-s) of 1 results in a generated data set with about 1 GB size.
+
+```bash
+./dbgen -T o -s 1
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/scala_api_guide.md
----------------------------------------------------------------------
diff --git a/docs/scala_api_guide.md b/docs/scala_api_guide.md
new file mode 100644
index 0000000..4b43938
--- /dev/null
+++ b/docs/scala_api_guide.md
@@ -0,0 +1,1008 @@
+---
+title: "Scala API Programming Guide"
+---
+
+
+Scala Programming Guide
+=======================
+
+This guide explains how to develop Stratosphere programs with the Scala
+programming interface. It assumes you are familiar with the general concepts of
+Stratosphere's [Programming Model](pmodel.html "Programming Model"). We
+recommend to learn about the basic concepts first, before continuing with the
+[Java](java.html "Java Programming Guide") or this Scala programming guide.
+
+Here we will look at the general structure of a Scala job. You will learn how to
+write data sources, data sinks, and operators to create data flows that can be
+executed using the Stratosphere system.
+
+Writing Scala jobs requires an understanding of Scala, there is excellent
+documentation available [here](http://scala-lang.org/documentation/). Most
+of the examples can be understood by someone with a good understanding
+of programming in general, though.
+
+<section id="intro-example">
+Word Count Example
+------------------
+
+To start, let's look at a Word Count job implemented in Scala. This program is
+very simple but it will give you a basic idea of what a Scala job looks like.
+
+```scala
+import eu.stratosphere.client.LocalExecutor
+
+import eu.stratosphere.api.scala._
+import eu.stratosphere.api.scala.operators._
+
+object WordCount {
+  def main(args: Array[String]) {
+    val input = TextFile(textInput)
+
+    val words = input.flatMap { _.split(" ") map { (_, 1) } }
+
+    val counts = words.groupBy { case (word, _) => word }
+      .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }
+
+    val output = counts.write(wordsOutput, CsvOutputFormat())
+    val plan = new ScalaPlan(Seq(output))
+
+    LocalExecutor.execute(plan)
+  }
+}
+``` 
+
+Same as any Stratosphere job a Scala job consists of one or several data
+sources, one or several data sinks and operators in between these that transform
+data. Together these parts are referred to as the data flow graph. It dictates
+the way data is passed when a job is executed.
+
+When using Scala in Stratosphere an important concept to grasp is that of the
+`DataSet`. `DataSet` is an abstract concept that represents actual data sets at
+runtime and which has operations that transform data to create a new transformed
+data set. In this example the `TextFile("/some/input")` call creates a
+`DataSet[String]` that represents the lines of text from the input. The
+`flatMap` operation that looks like a regular Scala flatMap is in fact an
+operation on `DataSet` that passes (at runtime) the data items through the
+provided anonymous function to transform them. The result of the `flatMap`
+operation is a new `DataSet` that represents the transformed data. On this other
+operations be performed. Another such operation are `groupBy` and `reduce`, but
+we will go into details of those later in this guide.
+
+The `write` operation of `DataSet` is used to create a data sink. You provide it
+with a path where the data is to be written to and an output format. This is
+enough for now but we will discuss data formats (for sources and sinks) later.
+
+To execute a data flow graph one or several sinks have to wrapped in a `Plan`
+which can then be executed on a cluster using `RemoteExecutor`. Here, the
+`LocalExecutor` is used to run the flow on the local computer. This is useful
+for debugging your job before running it on an actual cluster.
+
+<section id="intro-example">
+Project Setup
+-------------
+
+We will only cover maven here but the concepts should work equivalently with
+other build systems such as Gradle or sbt. When wanting to develop a Scala job
+all that is needed as dependency is is `stratosphere-scala` (and `stratosphere-clients`, if
+you want to execute your jobs). So all that needs to be done is to add the
+following lines to your POM.
+
+
+```xml
+<dependencies>
+  <dependency>
+    <groupId>eu.stratosphere</groupId>
+    <artifactId>stratosphere-scala</artifactId>
+    <version>{{site.current_stable}}</version>
+  </dependency>
+  <dependency>
+    <groupId>eu.stratosphere</groupId>
+    <artifactId>stratosphere-clients</artifactId>
+    <version>{{site.current_stable}}</version>
+  </dependency>
+</dependencies>
+```
+
+To quickly get started you can use the Stratosphere Scala quickstart available
+[here]({{site.baseurl}}/quickstart/scala.html). This will give you a
+completeMaven project with some working example code that you can use to explore
+the system or as basis for your own projects.
+
+These imports are normally enough for any project:
+
+```scala
+import eu.stratosphere.api.scala._
+import eu.stratosphere.api.scala.operators._
+
+import eu.stratosphere.client.LocalExecutor
+import eu.stratosphere.client.RemoteExecutor
+```
+
+The first two imports contain things like `DataSet`, `Plan`, data sources, data
+sinks, and the operations. The last two imports are required if you want to run
+a data flow on your local machine, respectively cluster.
+
+<section id="dataset">
+The DataSet Abstraction
+-----------------------
+
+As already alluded to in the introductory example you write Scala jobs by using
+operations on a `DataSet` to create new transformed `DataSet`. This concept is
+the core of the Stratosphere Scala API so it merits some more explanation. A
+`DataSet` can look and behave like a regular Scala collection in your code but
+it does not contain any actual data but only represents data. For example: when
+you use `TextFile()` you get back a `DataSource[String]` that represents each
+line of text in the input as a `String`. No data is actually loaded or available
+at this point. The set is only used to apply further operations which themselves
+are not executed until the data flow is executed. An operation on `DataSet`
+creates a new `DataSet` that represents the transformation and has a pointer to
+the `DataSet` that represents the data to be transformed. In this way a tree of
+data sets is created that contains both the specification of the flow of data as
+well as all the transformations. This graph can be wrapped in a `Plan` and
+executed.
+
+Working with the system is like working with lazy collections, where execution
+is postponed until the user submits the job.
+
+`DataSet` has a generic parameter, this is the type of each data item or record
+that would be processed by further transformations. This is similar to how
+`List[A]` in Scala would behave. For example in:
+
+```scala
+val input: DataSet[(String, Int)] = ...
+val mapped = input map { a => (a._1, a._2 + 1)}
+```
+
+The anonymous function would retrieve in `a` tuples of type `(String, Int)`.
+
+<section id="datatypes">
+Data Types
+----------
+
+There are some restrictions regarding the data types that can be used in Scala
+jobs (basically the generic parameter of `DataSet`). The usable types are
+the primitive Scala types, case classes (which includes tuples), and custom
+data types.
+
+Custom data types must implement the interface
+[Value](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-core/src/main/java/eu/stratosphere/types/Value.java).
+For custom data types that should also be used as a grouping key or join key
+the [Key](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-core/src/main/java/eu/stratosphere/types/Key.java)
+interface must be implemented.
+
+
+
+<section id="data-sources">
+Creating Data Sources
+---------------------
+
+To get an initial `DataSet` on which to perform operations to build a data flow
+graph the following construct is used:
+
+```scala
+val input = DataSource("<file-path>", <input-format>)
+```
+
+The value `input` is now a `DataSet` with the generic type depending on the
+input format.
+
+The file path can be on of either `file:///some/file` to acces files on the
+local machine or `hdfs://some/path` to read files from HDFS. The input
+format can be one of our builtin formats or a custom input format. The builtin
+formats are:
+
+* [TextInputFormat](#text-input-format)
+* [CsvInputFormat](#csv-input-format)
+* [DelimitedInputFormat](#delimited-input-format)
+* [BinaryInputFormat](#binary-input-format)
+* [BinarySerializedInputFormat](#binary-serialized-input-format)
+* [FixedLengthInputFormat](#fixed-length-input-format)
+
+We will now have a look at each of them and show how they are employed and in
+which situations.
+
+<section id="text-input-format">
+#### TextInputFormat
+
+This input format simply reads a text file line wise and creates a `String`
+for each line. It is used as:
+
+```scala
+TextInputFormat()
+```
+
+As you have already seen in the Word Count Example there is a shortcut for this.
+Instead of using a `DataSource` with `TextInputFormat` you can simply write:
+
+```scala
+val input = TextFile("<file-path>")
+```
+
+The `input` would then be a `DataSet[String]`.
+
+<section id="csv-input-format">
+#### CsvInputFormat
+
+This input format is mainly used to read Csv-Files, as the name suggests. Input
+files must be text files. You can specify the `String` that should be used
+as the separator between individual records (this would often be newline) and
+also the separator between fields of a record (this would often be a comma).
+The `CsvInputFormat` will automatically read the records and create
+Scala tuples or custom case class objects for you. The format can be used
+in one of the following ways:
+
+```scala
+CsvInputFormat[Out]()
+CsvInputFormat[Out](recordDelim: String)
+CsvInputFormat[Out](recordDelim: String, fieldDelim: Char)
+
+CsvInputFormat[Out](fieldIndices: Seq[Int])
+CsvInputFormat[Out](fieldIndices: Seq[Int], recordDelim: String)
+CsvInputFormat[Out](fieldIndices: Seq[Int], recordDelim: String, fieldDelim: Char)
+```
+
+The default record delimiter is a newline, the default field delimiter is a
+comma. The type parameter `Out` must be a case class type, which also includes
+tuple types since they are internally case classes.
+
+Normally, all the fields of a record are read. If you want to explicitly
+specify which fields of the record should be read you can use one of the
+tree variants with a `fieldIndices` parameter. Here you give a list
+of the fields that should be read. Field indices start from zero.
+
+An example usage could look as follows:
+
+```scala
+val input = DataSource("file:///some/file", CsvInputFormat[(Int, Int, String)](Seq(1, 17, 42), "\n", ','))
+```
+
+Here only the specified fields would be read and 3-tuples created for you.
+The type of input would be `DataSet[(Int, Int, String)]`.
+
+<section id="delimited-input-format">
+#### DelimitedInputFormat
+
+This input format is meant for textual records that are separated by
+some delimiter. The delimiter could be a newline, for example. It is used like
+this:
+
+```scala
+DelimitedInputFormat[Out](parseFunction: String => Out, delim: String = "\n")
+```
+
+The input files will be split on the supplied delimiter (or the default newline)
+and the supplied parse function must parse the textual representation in the
+`String` and return an object. The type of this object will then also be the
+type of the `DataSet` created by the `DataSource` operation.
+
+Just as with `BinaryInputFormat` the function can be an anonymous function, so
+you could have:
+
+```scala
+val input = DataSource("file:///some/file", BinaryInputFormat( { line =>
+  line match {
+    case EdgeInputPattern(from, to) => Path(from.toInt, to.toInt, 1)
+  }
+}))
+```
+
+In this example EdgeInputPattern is some regular expression used for parsing
+a line of text and `Path` is a custom case class that is used to represent
+the data. The type of input would in this case be `DataSet[Path]`.
+
+<section id="binary-input-format">
+#### BinaryInputFormat
+
+This input format is best used when you have a custom binary format that
+you store the data in. It is created using one of the following:
+
+```scala
+BinaryInputFormat[Out](readFunction: DataInput => Out)
+BinaryInputFormat[Out](readFunction: DataInput => Out, blocksize: Long)
+```
+
+So you have to provide a function that gets a
+[java.io.DataInput](http://docs.oracle.com/javase/7/docs/api/java/io/DataInput.html)
+and returns the object that
+contains the data. The type of this object will then also be the type of the
+`DataSet` created by the `DataSource` operation.
+
+The provided function can also be an anonymous function, so you could
+have something like this:
+
+```scala
+val input = DataSource("file:///some/file", BinaryInputFormat( { input =>
+  val one = input.readInt
+  val two = input.readDouble
+  (one, two)  
+}))
+```
+
+Here `input` would be of type `DataSet[(Int, Double)]`.
+
+<section id="binary-serialized-input-format">
+#### BinarySerializedInputFormat
+
+This input format is only meant to be used in conjunction with
+`BinarySerializedOutputFormat`. You can use these to write elements to files using a
+Stratosphere-internal format that can efficiently be read again. You should only
+use this when output is only meant to be consumed by other Stratosphere jobs.
+The format can be used on one of two ways:
+
+```scala
+BinarySerializedInputFormat[Out]()
+BinarySerializedInputFormat[Out](blocksize: Long)
+```
+
+So if input files contain elements of type `(String, Int)` (a tuple type) you
+could use:
+
+```scala
+val input = DataSource("file:///some/file", BinarySerializedInputFormat[(String, Int)]())
+```
+
+<section id="fixed-length-input-format">
+#### FixedLengthInputFormat
+
+This input format is for cases where you want to read binary blocks
+of a fixed size. The size of a block must be specified and you must
+provide code that reads elements from a byte array.
+
+The format is used like this:
+
+```scala
+FixedLengthInputFormat[Out](readFunction: (Array[Byte], Int) => Out, recordLength: Int)
+```
+
+The specified function gets an array and a position at which it must start
+reading the array and returns the element read from the binary data.
+
+
+<section id="operations">
+Operations on DataSet
+---------------------
+
+As explained in [Programming Model](pmodel.html#operators),
+a Stratosphere job is a graph of operators that process data coming from
+sources that is finally written to sinks. When you use the Scala front end
+these operators as well as the graph is created behind the scenes. For example,
+when you write code like this:
+
+```scala
+val input = TextFile("file:///some/file")
+
+val words = input.map { x => (x, 1) }
+
+val output = counts.write(words, CsvOutputFormat()))
+
+val plan = new ScalaPlan(Seq(output))
+```
+
+What you get is a graph that has a data source, a map operator (that contains
+the code written inside the anonymous function block), and a data sink. You 
+do not have to know about this to be able to use the Scala front end but
+it helps to remember, that when you are using Scala you are building
+a data flow graph that processes data only when executed.
+
+There are operations on `DataSet` that correspond to all the types of operators
+that the Stratosphere system supports. We will shortly go trough all of them with
+some examples.
+
+<section id="operator-templates">
+#### Basic Operator Templates
+
+Most of the operations have three similar versions and we will
+explain them here for all of the operators together. The three versions are `map`,
+`flatMap`, and `filter`. All of them accept an anonymous function that
+defines what the operation does but the semantics are different.
+
+The `map` version is a simple one to one mapping. Take a look at the following
+code:
+
+```scala
+val input: DataSet[(String, Int)]
+
+val mapped = input.map { x => (x._1, x._2 + 3) }
+```
+
+This defines a map operator that operates on tuples of String and Int and just
+adds three to the Int (the second fields of the tuple). So, if the input set had
+the tuples (a, 1), (b, 2), and (c, 3) the result after the operator would be
+(a, 4), (b, 5), and (c, 6).
+
+The `flatMap` version works a bit differently,
+here you return something iterable from the anonymous function. The iterable
+could be a list or an array. The elements in this iterable are unnested.
+So for every element in the input data you get a list of elements. The
+concatenation of those is the result of the operator. If you had
+the following code:
+
+```scala
+val input: DataSet[(String, Int)]
+
+val mapped = input.flatMap { x => List( (x._1, x._2), (x._1, x._2 + 1) ) }
+```
+
+and as input the tuples (a, 1) and (b, 1) you would get (a, 1), (a, 2), (b, 1),
+and (b, 2) as result. It is one flat list, and not the individual lists returned
+from the anonymous function.
+
+The third template is `filter`. Here you give an anonymous function that
+returns a Boolean. The elements for which this Boolean is true are part of the
+result of the operation, the others are culled. An example for a filter is this
+code:
+
+
+```scala
+val input: DataSet[(String, Int)]
+
+val mapped = input.filter { x => x._2 >= 3 }
+```
+
+<section id="key-selectors">
+#### Field/Key Selectors
+
+For some operations (group, join, and cogroup) it is necessary to specify which
+parts of a data type are to be considered the key. This key is used for grouping
+elements together for reduce and for joining in case of a join or cogroup.
+In Scala the key is specified using a special anonymous function called
+a key selector. The key selector has as input an element of the type of
+the `DataSet` and must return a single value or a tuple of values that should
+be considered the key. This will become clear with some examples: (Note that
+we use the reduce operation here as an example, we will have a look at
+that further down):
+
+```scala
+val input: DataSet[(String, Int)]
+val reduced = input groupBy { x => (x._1) } reduce { ... }
+val reduced2 = input groupBy { case (w, c) => w } reduce { ... }
+
+case class Test(a: String, b: Int, c: Int)
+val input2: DataSet[Test]
+val reduced3 = input2 groupBy { x => (x.a, x.b) } reduce { ... }
+val reduced4 = input2 groupBy { case Test(x,y,z) => (x,y) } reduce { ... }
+```
+
+The anonymous function block passed to `groupBy` is the key selector. The first
+two examples both specify the `String` field of the tuple as key. In the second
+set of examples we see a custom case class and here we select the first two
+fields as a compound key.
+
+It is worth noting that the key selector function is not actually executed 
+at runtime but it is parsed at job creation time where the key information is
+extracted and stored for efficient computation at runtime.
+
+#### Map Operation
+
+Map is an operation that gets one element at a time and can output one or
+several elements. The operations that result in a `MapOperator` in the graph are exactly
+those mentioned in the previous section. For completeness' sake we will mention
+their signatures here (in this and the following such lists `In` is the
+type of the input data set, `DataSet[In]`):
+
+```scala
+def map[Out](fun: In => Out): DataSet[Out]
+def flatMap[Out](fun: In => Iterator[Out]): DataSet[Out]
+def filter(fun: In => Boolean): DataSet[Out]
+```
+
+#### Reduce Operation
+
+As explained [here](pmodel.html#operators) Reduce is an operation that looks
+at groups of elements at a time and can, for one group, output one or several
+elements. To specify how elements should be grouped you need to give
+a key selection function, as explained [above](#key-selectors).
+
+The basic template of the reduce operation is:
+
+```scala
+input groupBy { <key selector> } reduce { <reduce function> }
+```
+
+The signature of the reduce function depends on the variety of reduce operation
+selected. There are right now three different versions:
+
+```scala
+def reduce(fun: (In, In) => In): DataSet[In]
+
+def reduceGroup[Out](fun: Iterator[In] => Out): DataSet[Out]
+def combinableReduceGroup(fun: Iterator[In] => In): DataSet[In]
+```
+
+The `reduce` variant is like a `reduceLeft` on a Scala collection with
+the limitation that the output data type must be the same as the input data
+type. You specify how to elements of the selection should be combined,
+this is then used to reduce the elements in one group (of the same key)
+down to one element. This can be used to implement aggregation operators,
+for example:
+
+```scala
+val words: DataSet[(String, Int)]
+val counts = words.groupBy { case (word, count) => word}
+  .reduce { (w1, w1) => (w1._1, w1._2 + w2._2) }
+```
+
+This would add up the Int fields of those tuples that have the same String
+in the first fields. As is for example required in Word Count.
+
+The `reduceGroup` variant can be used when more control is required. Here
+your reduce function gets an `Iterator` that can be used to iterate over
+all the elements in a group. With this type or reduce operation the
+output data type can be different from the input data type. An example
+of this kind of operation is this:
+
+```scala
+val words: DataSet[(String, Int)]
+val minCounts = words.groupBy { case (word, count) => word}
+  .reduceGroup { words => words.minBy { _._2 } }
+```
+
+Here we use the minBy function of Scala collections to determine the
+element with the minimum count in a group.
+
+The `combinableGroupReduce` works like the `groupReduce` with the difference
+that the reduce operation is combinable. This is an optimization one can use,
+please have a look at [Programming Model](pmodel.html "Programming Model") for
+the details.
+
+#### Join Operation
+
+The join operation is similar to a database equi-join. It is a two input
+iteration where you have to specify a key selector for each of the inputs
+and then the anonymous function is called for every pair of matching
+elements from the two input sides.
+
+The basic template is:
+
+```scala
+input1 join input2 where { <key selector 1> } isEqualTo { <key selector 2>} map { <join function> }
+```
+
+or, because lines will get to long fast:
+```scala
+input1.join(input2)
+  .where { <key selector 1> }
+  .isEqualTo { <key selector 2>}
+  .map { <join function> }
+```
+
+(Scala can sometimes be quite finicky about where you can omit dots and
+parentheses, so it's best to use dots in multi-line code like this.)
+
+As mentioned in [here](#operator-templates) there are three versions of
+this operator, so you can use one of these in the last position:
+
+```scala
+def map[Out](fun: (LeftIn, RightIn) => Out): DataSet[Out]
+def flatMap[Out](fun: (LeftIn, RightIn) => Iterator[Out]): DataSet[Out]
+def filter(fun: (LeftIn, RightIn) => Boolean): DataSet[(LeftIn, RightIn)]
+```
+
+One example where this can be used is database-style joining with projection:
+
+```scala
+input1.join(input2)
+  .where { case (a, b, c) => (a, b) }
+  .isEqualTo { case (a, b, c, d) => (c, d) }
+  .map { (left, right) => (left._3, right._1) }
+```
+
+Here the join key for the left input is a compound of the first two tuple fields
+while the key for the second input is a compound of the last two fields. We then
+pick one field each from both sides as the result of the operation.
+
+#### CoGroup Operation
+
+The cogroup operation is a cross between join and reduce. It has two inputs
+and you have to specify a key selector for each of them. This is where the
+similarities with join stop. Instead of having one invocation of your user
+code per pair of matching elements all elements from the left and from the right
+are grouped together for one single invocation. In your function you get
+an `Iterator` for the elements from the left input and another `Iterator`
+for the elements from the right input.
+
+The basic template is:
+
+```scala
+input1 cogroup input2 where { <key selector 1> } isEqualTo { <key selector 2>} map { <cogroup function> }
+```
+
+or, because lines will get to long fast:
+```scala
+input1.cogroup(input2)
+  .where { <key selector 1> }
+  .isEqualTo { <key selector 2>}
+  .map { <cogroup function> }
+```
+
+There are to variants you can use, with the semantics explained
+[here](#operator-templates).
+
+```scala
+def map[Out](fun: (Iterator[LeftIn], Iterator[RightIn]) => Out): DataSet[Out]
+def flatMap[Out](fun: (Iterator[LeftIn], Iterator[RightIn]) => Iterator[Out]): DataSet[Out]
+```
+
+#### Cross Operation
+
+The cross operation is used to form the Cartesian product of the elements
+from two inputs. The basic template is:
+
+```scala
+input1 cross input2 map { <cogroup function> }
+```
+
+Again there are three variants, with the semantics explained
+[here](#operator-templates).
+
+```scala
+def map[Out](fun: (LeftIn, RightIn) => Out): DataSet[Out]
+def flatMap[Out](fun: (LeftIn, RightIn) => Iterator[Out]): DataSet[Out]
+def filter(fun: (LeftIn, RightIn) => Boolean): DataSet[(LeftIn, RightIn)]
+```
+
+#### Union
+
+When you want to have the combination of several data sets as the input of
+an operation you can use a union to combine them. It is used like this
+
+```scala
+val input1: DataSet[String]
+val input2: DataSet[String]
+val unioned = input1.union(input2)
+```
+
+The signature of union is:
+
+```scala
+def union(secondInput: DataSet[A])
+```
+
+Where `A` is the generic type of the `DataSet` on which you execute the `union`.
+
+<section id="iterations">
+Iterations
+----------
+
+Iterations allow you to implement *loops* in Stratosphere programs.
+[This page](iterations.html) gives a
+general introduction to iterations. This section here provides quick examples
+of how to use the concepts using the Scala API.
+The iteration operators encapsulate a part of the program and execute it
+repeatedly, feeding back the result of one iteration (the partial solution) into
+the next iteration. Stratosphere has two different types of iterations,
+*Bulk Iteration* and *Delta Iteration*.
+
+For both types of iterations you provide the iteration body as a function
+that has data sets as input and returns a new data set. The difference is
+that bulk iterations map from one data set two one new data set while
+delta iterations map two data sets to two new data sets.
+
+#### Bulk Iteration
+
+The signature of the bulk iterate method is this:
+
+```scala
+def iterate(n: Int, stepFunction: DataSet[A] => DataSet[A])
+```
+
+where `A` is the type of the `DataSet` on which `iterate` is called. The number
+of steps is given in `n`. This is how you use it in practice:
+
+```scala
+val dataPoints = DataSource(dataPointInput, DelimitedInputFormat(parseInput))
+val clusterPoints = DataSource(clusterInput, DelimitedInputFormat(parseInput))
+
+def kMeansStep(centers: DataSet[(Int, Point)]) = {
+
+  val distances = dataPoints cross centers map computeDistance
+  val nearestCenters = distances.groupBy { case (pid, _) => pid }
+    .reduceGroup { ds => ds.minBy(_._2.distance) } map asPointSum.tupled
+  val newCenters = nearestCenters.groupBy { case (cid, _) => cid }
+    .reduceGroup sumPointSums map { case (cid, pSum) => cid -> pSum.toPoint() }
+
+  newCenters
+}
+
+val finalCenters = clusterPoints.iterate(numIterations, kMeansStep)
+
+val output = finalCenters.write(clusterOutput, DelimitedOutputFormat(formatOutput.tupled))
+```
+
+Not that we use some functions here which we don't show. If you want, you
+can check out the complete code in our KMeans example.
+
+#### Delta Iteration
+
+The signature of the delta iterate method is this:
+
+```scala
+def iterateWithDelta(workset: DataSet[W], solutionSetKey: A => K, stepFunction: (DataSet[A], DataSet[W]) => (DataSet[A], DataSet[W]), maxIterations: Int)
+```
+
+where `A` is the type of the `DataSet` on which `iterateWithDelta` is called,
+`W` is the type of the `DataSet` that represents the workset and `K` is the
+key type. The maximum number of iterations must always be given.
+
+For information on how delta iterations in general work on our system, please
+refer to [iterations](iterations.html). A working example job is
+available here:
+[Scala Connected Components Example](examples_scala.html#connected_components) 
+
+
+<section id="data-sinks">
+Creating Data Sinks
+-------------------
+
+The creation of data sinks is analog to the creation of data sources. `DataSet`
+has a `write` method that is used to create a sink that writes the output
+of the operation to a file in the local file system or HDFS. The general pattern
+is this:
+
+```scala
+val sink = out.write("<file-path>", <output-format>)
+```
+
+Where `out` is some `DataSet`. Just as for data sources, the file path can be
+on of either `file:///some/file` to acces files on the local machine or
+`hdfs://some/path` to read files from HDFS. The output format can be one of our
+builtin formats or a custom output format. The builtin formats are:
+
+* [DelimitedOutputFormat](#delimited-output-format)
+* [CsvOutputFormat](#csv-output-format)
+* [RawOutputFormat](#raw-output-format)
+* [BinaryOutputFormat](#binary-output-format)
+* [BinarySerializedOutputFormat](#binary-serialized-output-format)
+
+We will now have a look at each of them and show how they are employed and in
+which situations.
+
+<section id="delimited-output-format">
+#### DelimitedOutputFormat
+
+This output format is meant for writing textual records that are separated by
+some delimiter. The delimiter could be a newline, for example. It is used like
+this:
+
+```scala
+DelimitedOutputFormat[In](formatFunction: In => String, delim: String = "\n")
+```
+
+For every element in the `DataSet` the formatting function is called and
+the result of that is appended to the output file. In between the elements
+the `delim` string is inserted.
+
+An example would be:
+
+```scala
+val out: DataSet[(String, Int)]
+val sink = out.write("file:///some/file", DelimitedOutputFormat( { elem =>
+  "%s|%d".format(elem._1, elem._2)
+}))
+```
+
+Here we use Scala String formatting to write the two fields of the tuple
+separated by a pipe character. The default newline delimiter will be inserted
+between the elements in the output files.
+
+<section id="csv-output-format">
+#### CsvOutputFormat
+
+This output format can be used to automatically write fields of tuple
+elements or case classes to CSV files. You can specify what separator should
+be used between fields of an element and also the separator between elements.
+
+```scala
+CsvOutputFormat[In]()
+CsvOutputFormat[In](recordDelim: String)
+CsvOutputFormat[In](recordDelim: String, fieldDelim: Char)
+```
+
+The default record delimiter is a newline, the default field delimiter is a
+comma. 
+
+An example usage could look as follows:
+
+```scala
+val out: DataSet[(String, Int)]
+val sink = out.write("file:///some/file", CsvOutputFormat())
+```
+
+Notice how we don't need to specify the generic type here, it is inferred.
+
+<section id="raw-output-format">
+#### RawOutputFormat
+
+This input format can be used when you want to have complete control over
+what gets written. You get an
+[OutputStream](http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html)
+and can write the elements of the `DataSet` exactly as you see fit.
+
+A `RawOutputFormat` is created like this:
+
+```scala
+RawOutputFormat[In](writeFunction: (In, OutputStream) => Unit)
+```
+
+The function you pass in gets one element from the `DataSet` and must
+write it to the given `OutputStream`. An example would be the following:
+
+```scala
+val out: DataSet[(String, Int)]
+val sink = out.write("file:///some/file", RawOutputFormat( { (elem, output) =>
+  /* write elem._1 and elem._2 to output */ 
+}))
+```
+
+<section id="binary-output-format">
+#### BinaryOutputFormat
+
+This format is very similar to the RawOutputFormat. The difference is that
+instead of an [OutputStream](http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html)
+you get a [DataOutput](http://docs.oracle.com/javase/7/docs/api/java/io/DataOutput.html)
+to which you can write binary data. You can also specify the block size for
+the binary output file. When you don't specify a block size some default
+is used.
+
+A `BinaryOutputFormat` is created like this:
+
+```scala
+BinaryOutputFormat[In](writeFunction: (In, DataOutput) => Unit)
+BinaryOutputFormat[In](writeFunction: (In, DataOutput) => Unit, blockSize: Long)
+```
+
+<section id="binary-serialized-output-format">
+#### BinarySerializedOutputFormat
+
+This output format is only meant to be used in conjunction with
+`BinarySerializedInputFormat`. You can use these to write elements to files using a
+Stratosphere-internal format that can efficiently be read again. You should only
+use this when output is only meant to be consumed by other Stratosphere jobs.
+The output format can be used on one of two ways:
+
+```scala
+BinarySerializedOutputFormat[In]()
+BinarySerializedOutputFormat[In](blocksize: Long)
+```
+
+So to write elements of some `DataSet[A]` to a binary file you could use:
+
+```scala
+val out: DataSet[(String, Int)]
+val sink = out.write("file:///some/file", BinarySerializedInputFormat())
+```
+
+As you can see the type of the elements need not be specified, it is inferred
+by Scala.
+
+<section id="execution">
+Executing Jobs
+--------------
+
+To execute a data flow graph the sinks need to be wrapped in a
+[ScalaPlan](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-scala/src/main/scala/eu/stratosphere/api/scala/ScalaPlan.scala)
+object like this:
+
+```scala
+val out: DataSet[(String, Int)]
+val sink = out.write("file:///some/file", CsvOutputFormat())
+
+val plan = new ScalaPlan(Seq(sink))
+```
+
+You can put several sinks into the `Seq` that is passed to the constructor.
+
+There are two ways one can execute a data flow plan: local execution and
+remote/cluster execution. When using local execution the plan is executed on
+the local computer. This is handy while developing jobs because you can
+easily debug your code and iterate quickly. When a job is ready to be
+used on bigger data sets it can be executed on a cluster. We will
+now give an example for each of the two execution modes.
+
+First up is local execution:
+
+```scala
+import eu.stratosphere.client.LocalExecutor
+
+...
+
+val plan: ScalaPlan = ...
+LocalExecutor.execute(plan)
+```
+
+This is all there is to it.
+
+Remote (or cluster) execution is a bit more complicated because you have
+to package your code in a jar file so that it can be distributed on the cluster.
+Have a look at the [scala quickstart](/quickstart/scala.html) to see how you
+can set up a maven project that does the packaging. Remote execution is done
+using the [RemoteExecutor](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-clients/src/main/java/eu/stratosphere/client/RemoteExecutor.java), like this:
+
+```scala
+import eu.stratosphere.client.RemoteExecutor
+
+...
+
+val plan: ScalaPlan = ...
+val ex = new RemoteExecutor("<job manager ip address>", <job manager port>, "your.jar");
+ex.executePlan(plan);
+```
+
+The IP address and the port of the Stratosphere job manager depend on your
+setup. Have a look at [cluster quickstart](/quickstart/setup.html) for a quick
+guide about how to set up a cluster. The default cluster port is 6123, so
+if you run a job manger on your local computer you can give this and "localhost"
+as the first to parameters to the `RemoteExecutor` constructor.
+
+<section id="rich-functions">
+Rich Functions
+--------------
+
+Sometimes having a single function that is passed to an operation is not enough.
+Using Rich Functions it is possible to have state inside your operations and
+have code executed before the first element is processed and after the last
+element is processed. For example, instead of a simple function as in this
+example:
+
+```scala
+val mapped = input map { x => x + 1 }
+```
+
+you can have a rich function like this:
+
+```scala
+val mapped = input map( new MapFunction[(String, Int), (String, Int)] {
+  val someState: SomeType = ...
+  override def open(config: Configuration) = {
+    // one-time initialization code
+  }
+
+  override def close() = {
+    // one-time clean-up code
+  }
+
+  override def apply(in: (String, Int)) = {
+    // do complex stuff
+    val result = ...
+    result
+  }
+})
+```
+
+You could also create a custom class that derives from `MapFunction`
+instead of the anonymous class we used here.
+
+There are rich functions for all the various operator types. The basic
+template is the some, though. The common interface that they implement 
+is [Function](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-core/src/main/java/eu/stratosphere/api/common/functions/Function.java). The `open` and `close` methods can be overridden to run set-up
+and tear-down code. The other methods can be used in a rich function to
+work with the runtime context which gives information about the context
+of the operator. Your operation code must now reside in an `apply` method
+that has the same signature as the anonymous function you would normally
+supply.
+
+The rich functions reside in the package `eu.stratosphere.api.scala.functions`.
+This is a list of all the rich functions can can be used instead of
+simple functions in the respective operations:
+
+```scala
+abstract class MapFunction[In, Out] 
+abstract class FlatMapFunction[In, Out] 
+abstract class FilterFunction[In, Out] 
+
+abstract class ReduceFunction[In]
+abstract class GroupReduceFunction[In, Out]
+abstract class CombinableGroupReduceFunction[In, Out]
+
+abstract class JoinFunction[LeftIn, RightIn, Out]
+abstract class FlatJoinFunction[LeftIn, RightIn, Out]
+
+abstract class CoGroupFunction[LeftIn, RightIn, Out]
+abstract class FlatCoGroupFunction[LeftIn, RightIn, Out]
+
+abstract class CrossFunction[LeftIn, RightIn, Out]
+abstract class FlatCrossFunction[LeftIn, RightIn, Out]
+```
+
+Note that for all the rich stubs, you need to specify the generic type of
+the input (or inputs) and the output type.

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/scala_api_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/scala_api_quickstart.md b/docs/scala_api_quickstart.md
new file mode 100644
index 0000000..e15eed0
--- /dev/null
+++ b/docs/scala_api_quickstart.md
@@ -0,0 +1,71 @@
+---
+title: "Quick Start: Scala API"
+---
+
+<p class="lead">Start working on your Stratosphere Scala program in a few simple steps.</p>
+
+<section id="requirements">
+  <div class="page-header"><h2>Requirements</h2></div>
+  <p class="lead">The only requirements are working <strong>Maven 3.0.4</strong> (or higher) and <strong>Java 6.x</strong> (or higher) installations.</p>
+</section>
+
+<section id="create_project">
+  <div class="page-header"><h2>Create Project</h2></div>
+  <p class="lead">Use one of the following commands to <strong>create a project</strong>:</p>
+
+  <ul class="nav nav-tabs" style="border-bottom: none;">
+      <li class="active"><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li>
+      <li><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li>
+  </ul>
+  <div class="tab-content">
+      <div class="tab-pane active" id="quickstart-script">
+{% highlight bash %}
+$ curl https://raw.githubusercontent.com/stratosphere/stratosphere-quickstart/master/quickstart-scala.sh | bash
+{% endhighlight %}
+      </div>
+      <div class="tab-pane" id="maven-archetype">
+{% highlight bash %}
+$ mvn archetype:generate                             \
+    -DarchetypeGroupId=eu.stratosphere               \
+    -DarchetypeArtifactId=quickstart-scala           \
+    -DarchetypeVersion={{site.current_stable}}                  
+{% endhighlight %}
+      This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name.
+      </div>
+  </div>
+</section>
+
+<section id="inspect_project">
+  <div class="page-header"><h2>Inspect Project</h2></div>
+  <p class="lead">There will be a <strong>new directory in your working directory</strong>. If you've used the <em>curl</em> approach, the directory is called <code>quickstart</code>. Otherwise, it has the name of your artifactId.</p>
+  <p class="lead">The sample project is a <strong>Maven project</strong>, which contains a sample scala <em>Job</em> that implements Word Count. Please note that the <em>RunJobLocal</em> and <em>RunJobRemote</em> objects allow you to start Stratosphere in a development/testing mode.</p>
+  <p class="lead">We recommend to <strong>import this project into your IDE</strong>. For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites:
+    <ul>
+      <li class="lead"><strong>Eclipse 4.x</strong>:
+        <ul>
+          <li><strong>Scala IDE</strong> <small>(http://download.scala-ide.org/sdk/e38/scala210/stable/site)</small></li>
+          <li><strong>m2eclipse-scala</strong> <small>(http://alchim31.free.fr/m2e-scala/update-site)</small></li>
+          <li><strong>Build Helper Maven Plugin</strong> <small>(https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/)</small></li>
+        </ul>
+      </li>
+      <li class="lead"><strong>Eclipse 3.7</strong>:
+        <ul>
+          <li><strong>Scala IDE</strong> <small>(http://download.scala-ide.org/sdk/e37/scala210/stable/site)</small></li>
+          <li><strong>m2eclipse-scala</strong> <small>(http://alchim31.free.fr/m2e-scala/update-site)</small></li>
+          <li><strong>Build Helper Maven Plugin</strong> <small>(https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/)</small></li>
+        </ul>
+      </li>
+    </ul>
+  </p>
+  <p class="lead">The IntelliJ IDE also supports Maven and offers a plugin for Scala development.</p>
+</section>
+
+<section id="build_project">
+  <div class="page-header"><h2>Build Project</h2></div>
+  <p class="lead">If you want to <strong>build your project</strong>, go to your project directory and issue the <code>mvn clean package</code> command. You will <strong>find a jar</strong> that runs on every Stratosphere cluster in <code>target/stratosphere-project-0.1-SNAPSHOT.jar</code>.</p>
+</section>
+
+<section id="next_steps">
+  <div class="page-header"><h2>Next Steps</h2></div>
+  <p class="lead"><strong>Write your application!</strong> If you have any trouble, ask on our <a href="https://github.com/stratosphere/stratosphere/issues">GitHub page</a> (open an issue) or on our <a href="https://groups.google.com/forum/#!forum/stratosphere-dev">Mailing list</a>. We are happy to provide help.</p>
+</p>

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/setup_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/setup_quickstart.md b/docs/setup_quickstart.md
new file mode 100644
index 0000000..debe21c
--- /dev/null
+++ b/docs/setup_quickstart.md
@@ -0,0 +1,132 @@
+---
+title: "Quickstart: Setup"
+---
+
+<p class="lead">Get Stratosphere up and running in a few simple steps.</p>
+
+<section id="requirements">
+  <div class="page-header"><h2>Requirements</h2></div>
+  <p class="lead">Stratosphere runs on all <em>UNIX-like</em> environments: <strong>Linux</strong>, <strong>Mac OS X</strong>, <strong>Cygwin</strong>. The only requirement is to have a working <strong>Java 6.x</strong> (or higher) installation.</p>
+</section>
+
+<section id="download">
+  <div class="page-header"><h2>Download</h2></div>
+  <p class="lead">Download the ready to run binary package. Choose the Stratosphere distribution that <strong>matches your Hadoop version</strong>. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.</p>
+  <p>
+  	<ul class="nav nav-tabs">
+  		<li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li>
+      <li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li>
+		</ul>
+		<div class="tab-content text-center">
+			<div class="tab-pane active" id="bin-hadoop1">
+				<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.current_stable_dl}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 1.2</a>
+	    </div>
+			<div class="tab-pane" id="bin-hadoop2">
+	      <a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.current_stable_dl_yarn}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 2 (YARN)</a>
+	    </div>
+	  </div>
+	</p>
+</section>
+
+<section id="start">
+  <div class="page-header"><h2>Start</h2></div> 
+  <p class="lead">You are almost done.</p>
+  <ol>
+  	<li class="lead"><strong>Go to the download directory</strong>,</li>
+  	<li class="lead"><strong>Unpack the downloaded archive</strong>, and</li>
+  	<li class="lead"><strong>Start Stratosphere</strong>.</li>
+  </ol>
+
+{% highlight bash %}
+$ cd ~/Downloads              # Go to download directory
+$ tar xzf stratosphere-*.tgz  # Unpack the downloaded archive
+$ cd stratosphere
+$ bin/start-local.sh          # Start Stratosphere
+{% endhighlight %}
+
+  <p class="lead">Check the <strong>JobManager's web frontend</strong> at <a href="http://localhost:8081">http://localhost:8081</a> and make sure everything is up and running.</p>
+</section>
+
+<section id="example">
+  <div class="page-header"><h2>Run Example</h2></div>
+  <p class="lead">Run the <strong>Word Count example</strong> to see Stratosphere at work.</p>
+
+  <ol>
+  	<li class="lead"><strong>Download test data:</strong>
+{% highlight bash %}
+$ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt
+{% endhighlight %}
+		  You now have a text file called <em>hamlet.txt</em> in your working directory.
+		</li>
+  	<li class="lead"><strong>Start the example program</strong>:
+{% highlight bash %}
+$ bin/stratosphere run \
+    --jarfile ./examples/stratosphere-java-examples-{{site.current_stable}}-WordCount.jar \
+    --arguments file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt
+{% endhighlight %}
+      You will find a file called <strong>wordcount-result.txt</strong> in your current directory.
+  	</li>
+  </ol>
+
+</section>
+
+<section id="cluster">
+  <div class="page-header"><h2>Cluster Setup</h2></div>
+  <p class="lead"><strong>Running Stratosphere on a cluster</strong> is as easy as running it locally. Having <strong>passwordless SSH</strong> and <strong>the same directory structure</strong> on all your cluster nodes lets you use our scripts to control everything.</p>
+  <ol>
+  	<li class="lead">Copy the unpacked <strong>stratosphere</strong> directory from the downloaded archive to the same file system path on each node of your setup.</li>
+  	<li class="lead">Choose a <strong>master node</strong> (JobManager) and set the <code>jobmanager.rpc.address</code> key in <code>conf/stratosphere-conf.yaml</code> to its IP or hostname. Make sure that all nodes in your cluster have the same <code>jobmanager.rpc.address</code> configured.</li>
+  	<li class="lead">Add the IPs or hostnames (one per line) of all <strong>worker nodes</strong> (TaskManager) to the slaves files in <code>conf/slaves</code>.</li>
+  </ol>
+  <p class="lead">You can now <strong>start the cluster</strong> at your master node with <code>bin/start-cluster.sh</code>.</p>
+  <p class="lead">
+    The following <strong>example</strong> illustrates the setup with three nodes (with IP addresses from <em>10.0.0.1</em> to <em>10.0.0.3</em> and hostnames <em>master</em>, <em>worker1</em>, <em>worker2</em>) and shows the contents of the configuration files, which need to be accessible at the same path on all machines:
+  </p>
+  <div class="row">
+    <div class="col-md-6 text-center">
+      <img src="{{ site.baseurl }}/img/quickstart_cluster.png" style="width: 85%">
+    </div>
+    <div class="col-md-6">
+      <div class="row">
+        <p class="lead text-center">
+        /path/to/<strong>stratosphere/conf/<br>stratosphere-conf.yaml</strong>
+<pre>
+jobmanager.rpc.address: 10.0.0.1
+</pre>
+        </p>
+      </div>
+      <div class="row" style="margin-top: 1em;">
+        <p class="lead text-center">
+        /path/to/<strong>stratosphere/<br>conf/slaves</strong>
+<pre>
+10.0.0.2
+10.0.0.3
+</pre>
+        </p>
+      </div>
+    </div>
+  </div>
+</section>
+
+<section id="yarn">
+  <div class="page-header"><h2>Stratosphere on YARN</h2></div>
+  <p class="lead">You can easily deploy Stratosphere on your existing <strong>YARN cluster</strong>. 
+    <ol>
+    <li class="lead">Download the <strong>Stratosphere YARN package</strong> with the YARN client:
+      <div class="text-center" style="padding: 1em;">
+      <a style="padding-left:10px" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-yarn',this.href]);" class="btn btn-info btn-lg" href="{{site.current_stable_uberjar}}"><i class="icon-download"> </i> Stratosphere {{ site.current_stable }} for YARN</a>
+      </div>
+    </li>
+    <li class="lead">Make sure your <strong>HADOOP_HOME</strong> (or <em>YARN_CONF_DIR</em> or <em>HADOOP_CONF_DIR</em>) <strong>environment variable</strong> is set to read your YARN and HDFS configuration.</li>
+    <li class="lead">Run the <strong>YARN client</strong> with:
+      <div class="text-center" style="padding:1em;">
+        <code>./bin/yarn-session.sh</code>
+      </div>
+      
+      You can run the client with options <code>-n 10 -tm 8192</code> to allocate 10 TaskManagers with 8GB of memory each.</li>
+  </ol>
+  </p>
+</section>
+
+<hr />
+<p class="lead">For <strong>more detailed instructions</strong>, check out the <a href="{{site.baseurl}}/docs/{{site.current_stable_documentation}}">Documentation</a>.</p>
\ No newline at end of file


Mime
View raw message