flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject [7/7] git commit: [FLINK-962] Initial import of documentation from website into source code (closes #34)
Date Mon, 23 Jun 2014 12:52:26 GMT
[FLINK-962] Initial import of documentation from website into source code (closes #34)


Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/40b94f73
Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/40b94f73
Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/40b94f73

Branch: refs/heads/master
Commit: 40b94f73300788e191d32c5918e647bff748e151
Parents: b4b633e
Author: uce <u.celebi@fu-berlin.de>
Authored: Sat Jun 21 15:14:05 2014 +0200
Committer: uce <u.celebi@fu-berlin.de>
Committed: Mon Jun 23 14:51:07 2014 +0200

----------------------------------------------------------------------
 docs/README.md                                  |   60 +
 docs/_config.yml                                |   33 +
 docs/_layouts/docs.html                         |   92 ++
 docs/_plugins/tocify.rb                         |   10 +
 docs/build_docs.sh                              |   58 +
 docs/cli.md                                     |  129 ++
 docs/cluster_execution.md                       |  125 ++
 docs/cluster_setup.md                           |  363 +++++
 docs/config.md                                  |  171 ++
 docs/css/syntax.css                             |   60 +
 docs/faq.md                                     |  285 ++++
 docs/hadoop_compatability.md                    |    5 +
 docs/img/cogroup.svg                            |  856 ++++++++++
 docs/img/cross.svg                              |  893 +++++++++++
 docs/img/dataflow.svg                           |  979 ++++++++++++
 docs/img/datatypes.svg                          |  143 ++
 docs/img/iterations_delta_iterate_operator.png  |  Bin 0 -> 113607 bytes
 ...terations_delta_iterate_operator_example.png |  Bin 0 -> 335057 bytes
 docs/img/iterations_iterate_operator.png        |  Bin 0 -> 63465 bytes
 .../img/iterations_iterate_operator_example.png |  Bin 0 -> 102925 bytes
 docs/img/iterations_supersteps.png              |  Bin 0 -> 54098 bytes
 docs/img/japi_example_overview.png              |  Bin 0 -> 45406 bytes
 docs/img/join.svg                               |  615 ++++++++
 docs/img/map.svg                                |  295 ++++
 docs/img/operator.svg                           |  241 +++
 docs/img/recorddm.svg                           |  263 ++++
 docs/img/reduce.svg                             |  425 +++++
 docs/img/spargel_example.png                    |  Bin 0 -> 199032 bytes
 docs/img/spargel_example_input.png              |  Bin 0 -> 113478 bytes
 docs/index.md                                   |   11 +
 docs/iterations.md                              |  188 +++
 docs/java_api_examples.md                       |  304 ++++
 docs/java_api_guide.md                          | 1476 ++++++++++++++++++
 docs/java_api_quickstart.md                     |  126 ++
 docs/local_execution.md                         |  106 ++
 docs/local_setup.md                             |  108 ++
 docs/quickstart/plotPoints.py                   |   82 +
 docs/run_example_quickstart.md                  |  154 ++
 docs/scala_api_examples.md                      |  195 +++
 docs/scala_api_guide.md                         | 1008 ++++++++++++
 docs/scala_api_quickstart.md                    |   71 +
 docs/setup_quickstart.md                        |  132 ++
 docs/spargel_guide.md                           |  112 ++
 docs/web_client.md                              |   53 +
 docs/yarn_setup.md                              |  188 +++
 pom.xml                                         |    2 +
 46 files changed, 10417 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..4ecb30e
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,60 @@
+This README gives an overview of how to build and contribute to the
+documentation of Apache Flink.
+
+The documentation is included with the source of Apache Flink in order to ensure
+that you always have docs corresponding to your checked out version. The online
+documentation at http://flink.incubator.apache.org/ is also generated from the
+files found here.
+
+# Requirements
+
+We use Markdown to write and Jekyll to translate the documentation to static
+HTML. You can install all needed software via:
+
+    gem install jekyll
+    gem install redcarpet
+    sudo easy_install Pygments
+
+Redcarpet is needed for Markdown processing and the Python based Pygments is
+used for syntax highlighting.
+
+# Build
+
+The `docs/build_docs.sh` script calls Jekyll and generates the documentation to
+`docs/target`. You can then point your browser to `docs/target/index.html` and
+start reading.
+
+If you call the script with the preview flag `build_docs.sh -p`, Jekyll will
+start a web server at `localhost:4000` and continiously generate the docs.
+This is useful to preview changes locally.
+
+# Contribute
+
+The documentation pages are written in
+[Markdown](http://daringfireball.net/projects/markdown/syntax). It is possible
+to use the [GitHub flavored syntax](http://github.github.com/github-flavored-markdown)
+and intermix plain html.
+
+In addition to Markdown, every page contains a front matter, which specifies the
+title of the page. This title is used as the top-level heading for the page.
+
+    ---
+    title: "Title of the Page"
+    ---
+
+Furthermore, you can access variables found in `docs/_config.yml` as follows:
+
+    {{ site.FLINK_VERSION }}
+
+This will be replaced with the value of the variable when generating the docs.
+
+All documents are structed with headings. From these heading, an page outline is
+automatically generated for each page.
+
+```
+# Level-1 Heading
+## Level-2 Heading
+### Level-3 heading
+#### Level-4 heading
+##### Level-5 heading
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/_config.yml
----------------------------------------------------------------------
diff --git a/docs/_config.yml b/docs/_config.yml
new file mode 100644
index 0000000..2d7c1ce
--- /dev/null
+++ b/docs/_config.yml
@@ -0,0 +1,33 @@
+#------------------------------------------------------------------------------
+# VARIABLES
+#------------------------------------------------------------------------------
+# Variables specified in this file can be used in the documentation via:
+#     {{ site.CONFIG_KEY }}
+#------------------------------------------------------------------------------
+
+FLINK_VERSION: 0.6-SNAPSHOT
+FLINK_VERSION_SHORT: 0.6
+FLINK_ISSUES_URL: https://issues.apache.org/jira/browse/FLINK
+FLINK_GITHUB_URL:  https://github.com/apache/incubator-flink
+
+#------------------------------------------------------------------------------
+# BUILD CONFIG
+#------------------------------------------------------------------------------
+# These variables configure the jekyll build (./build_docs.sh). You don't need
+# to change anything here.
+#------------------------------------------------------------------------------
+
+defaults:
+  -
+    scope:
+      path: ""
+    values:
+      layout: docs
+
+highlighter: pygments
+markdown: redcarpet
+redcarpet:
+  # https://help.github.com/articles/github-flavored-markdown
+  extensions: ["no_intra_emphasis", "fenced_code_blocks", "autolink",
+               "tables", "with_toc_data", "strikethrough", "superscript",
+               "lax_spacing"]
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/_layouts/docs.html
----------------------------------------------------------------------
diff --git a/docs/_layouts/docs.html b/docs/_layouts/docs.html
new file mode 100644
index 0000000..4b99d4a
--- /dev/null
+++ b/docs/_layouts/docs.html
@@ -0,0 +1,92 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <title>Apache Flink {{ site.FLINK_VERSION }} Documentation: {{ page.title }}</title>
+
+    <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css">
+    <link rel="stylesheet" href="css/syntax.css">
+
+    <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
+      <script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
+    <![endif]-->
+  </head>
+  <body>
+    <div class="container">
+        <div class="row">
+            <h1>Apache Flink {{ site.FLINK_VERSION }} Documentation</h1>
+        </div>
+        <div class="row">
+            <div class="col-md-3">
+                <ul>
+                    <li>Quickstart
+                        <ul>
+                            <li><a href="setup_quickstart.html">Install</a></li>
+                            <li><a href="run_example_quickstart.html">Run Example</a></li>
+                            <li><a href="java_api_quickstart.html">Java API</a></li>
+                            <li><a href="scala_api_quickstart.html">Scala API</a></li>
+                            <li><a href="faq.html">FAQ</a></li>
+                        </ul>
+                    </li>
+
+                    <li>Setup &amp; Configuration
+                        <ul>
+                            <li><a href="local_setup.html">Local Setup</a></li>
+                            <li><a href="cluster_setup.html">Cluster Setup</a></li>
+                            <li><a href="yarn_setup.html">YARN Setup</a></li>
+                            <li><a href="config.html">Configuration</a></li>
+                        </ul>
+                    </li>
+
+                    <li>Programming Guides
+                        <ul>
+                            <li><a href="java_api_guide.html">Java API</a></li>
+                            <li><a href="scala_api_guide.html">Scala API</a></li>
+                            <li><a href="hadoop_compatability.html">Hadoop Compatability</a></li>
+                            <li><a href="iterations.html">Iterations</a></li>
+                            <li><a href="spargel_guide.html">Spargel Graph API</a></li>
+                        </ul>
+                    </li>
+
+                    <li>Examples
+                        <ul>
+                            <li><a href="java_api_examples.html">Java API</a></li>
+                            <li><a href="scala_api_examples.html">Scala API</a></li>
+                        </ul>
+                    </li>
+
+                    <li>Execution
+                        <ul>
+                            <li><a href="local_execution.html">Local/Debugging</a></li>
+                            <li><a href="cluster_execution.html">Cluster</a></li>
+                            <li><a href="cli.html">Command-Line Interface</a></li>
+                            <li><a href="web_client.html">Web Interface</a></li>
+                        </ul>
+                    </li>
+
+                    <li>Internals
+                        <ul>
+                            <li>To be written</li>
+                        </ul>
+                    </li>
+                </ul>
+            </div>
+            <div class="col-md-9">
+                <h1>{{ page.title }}</h1>
+
+                {{ page.content | tocify }}
+
+                {{ content }}
+            </div>
+        </div>
+    </div>
+
+    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
+    <script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js"></script>
+  </body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/_plugins/tocify.rb
----------------------------------------------------------------------
diff --git a/docs/_plugins/tocify.rb b/docs/_plugins/tocify.rb
new file mode 100644
index 0000000..7df0c3d
--- /dev/null
+++ b/docs/_plugins/tocify.rb
@@ -0,0 +1,10 @@
+module Jekyll
+  module Tocify
+    def tocify(input)
+      converter = Redcarpet::Markdown.new(Redcarpet::Render::HTML_TOC)
+      converter.render(input)
+    end
+  end
+end
+
+Liquid::Template.register_filter(Jekyll::Tocify)

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/build_docs.sh
----------------------------------------------------------------------
diff --git a/docs/build_docs.sh b/docs/build_docs.sh
new file mode 100755
index 0000000..7ae3343
--- /dev/null
+++ b/docs/build_docs.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+########################################################################################################################
+# Copyright (C) 2010-2014 by the Stratos	phere project (http://stratosphere.eu)
+#
+# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+#	  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+# an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations under the License.
+########################################################################################################################
+
+HAS_JEKYLL=true
+
+command -v jekyll > /dev/null
+if [ $? -ne 0 ]; then
+	echo -n "ERROR: Could not find jekyll. "
+	echo "Please install with 'gem install jekyll' (see http://jekyllrb.com)."
+
+	HAS_JEKYLL=false
+fi
+
+command -v redcarpet > /dev/null
+if [ $? -ne 0 ]; then
+	echo -n "WARN: Could not find redcarpet. "
+	echo -n "Please install with 'sudo gem install redcarpet' (see https://github.com/vmg/redcarpet). "
+	echo "Redcarpet is needed for Markdown parsing and table of contents generation."
+fi
+
+command -v pygmentize > /dev/null
+if [ $? -ne 0 ]; then
+	echo -n "WARN: Could not find pygments. "
+	echo -n "Please install with 'sudo easy_install Pygments' (requires Python; see http://pygments.org). "
+	echo "Pygments is needed for syntax highlighting of the code examples."
+fi
+
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+DOCS_SRC=${DIR}
+DOCS_DST=${DOCS_SRC}/target
+
+# default jekyll command is to just build site
+JEKYLL_CMD="build"
+
+# if -p flag is provided, serve site on localhost
+while getopts ":p" opt; do
+	case $opt in
+		p)
+		JEKYLL_CMD="serve --watch"
+		;;
+	esac
+done
+
+if $HAS_JEKYLL; then
+	jekyll ${JEKYLL_CMD} --source ${DOCS_SRC} --destination ${DOCS_DST}
+fi
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/cli.md
----------------------------------------------------------------------
diff --git a/docs/cli.md b/docs/cli.md
new file mode 100644
index 0000000..0e778da
--- /dev/null
+++ b/docs/cli.md
@@ -0,0 +1,129 @@
+---
+title:  "Command-Line Interface"
+---
+
+Stratosphere provides a command-line interface to run programs that are packaged
+as JAR files, and control their execution.  The command line interface is part
+of any Stratosphere setup, available in local single node setups and in
+distributed setups. It is located under `<stratosphere-home>/bin/stratosphere`
+and connects by default to the running Stratosphere master (JobManager) that was
+started from the same installation directory.
+
+A prerequisite to using the command line interface is that the Stratosphere
+master (JobManager) has been started (via `<stratosphere-home>/bin/start-
+local.sh` or `<stratosphere-home>/bin/start-cluster.sh`).
+
+The command line can be used to
+
+- submit jobs for execution,
+- cancel a running job,
+- provide information about a job, and
+- list running and waiting jobs.
+
+# Examples
+
+-   Run example program with no arguments.
+
+        ./bin/stratosphere run ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar
+
+-   Run example program with arguments for input and result files
+
+        ./bin/stratosphere run ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar \
+                               file:///home/user/hamlet.txt file:///home/user/wordcount_out
+
+-   Run example program with parallelism 16 and arguments for input and result files
+
+        ./bin/stratosphere run -p 16 ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar \
+                                file:///home/user/hamlet.txt file:///home/user/wordcount_out
+
+-   Run example program on a specific JobManager:
+
+        ./bin/stratosphere run -m myJMHost:6123 \
+                               ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar \
+                               -file:///home/user/hamlet.txt file:///home/user/wordcount_out
+
+
+-   Display the expected arguments for the WordCount example program:
+
+        ./bin/stratosphere info -d ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar
+
+-   Display the optimized execution plan for the WordCount example program as JSON:
+
+        ./bin/stratosphere info -e 
+                                ./examples/stratosphere-java-examples-{{ site.FLINK_VERSION }}-WordCount.jar \
+                                file:///home/user/hamlet.txt file:///home/user/wordcount_out
+
+-   List scheduled and running jobs (including their JobIDs):
+
+        ./bin/stratosphere list -s -r
+
+-   Cancel a job:
+
+        ./bin/stratosphere cancel -i <jobID>
+
+# Usage
+
+The command line syntax is as follows:
+
+```
+./stratosphere <ACTION> [OPTIONS] [ARGUMENTS]
+
+General options:
+     -h,--help      Show the help for the CLI Frontend, or a specific action.
+     -v,--verbose   Print more detailed error messages.
+
+
+Action "run" - compiles and submits a Stratosphere program that is given in the form of a JAR file.
+
+  "run" options:
+
+     -p,--parallelism <parallelism> The degree of parallelism for the execution. This value is used unless the program overrides the degree of parallelism on the execution environment or program plan. If this option is not set, then the execution will use the default parallelism specified in the stratosphere-conf.yaml file.
+
+     -c,--class <classname>         The class with the entry point (main method, or getPlan() method). Needs only be specified if the JAR file has no manifest pointing to that class. See program packaging instructions for details.
+
+     -m,--jobmanager <host:port>    Option to submit the program to a different Stratosphere master (JobManager).
+
+  "run" arguments:
+
+     - The first argument is the path to the JAR file of the program.
+     - All successive arguments are passed to the program's main method (or getPlan() method).
+
+
+Action "info" - displays information about a Stratosphere program.
+
+  "info" action arguments:
+     -d,--description               Show description of the program, if the main class implements the 'ProgramDescription' interface.
+
+     -e,--executionplan             Show the execution data flow plan of the program, in JSON representation.
+
+     -p,--parallelism <parallelism> The degree of parallelism for the execution, see above. The parallelism is relevant for the execution plan. The option is only evaluated if used together with the -e option.
+
+     -c,--class <classname>         The class with the entry point (main method, or getPlan() method). Needs only be specified if the JAR file has no manifest pointing to that class. See program packaging instructions for details.
+
+     -m,--jobmanager <host:port>    Option to connect to a different Stratosphere master (JobManager). Connecting to a master is relevant to compile the execution plan. The option is only evaluated if used together with the -e option.
+
+  "info" arguments:
+
+     - The first argument is the path to the JAR file of the program.
+     - All successive arguments are passed to the program's main method (or getPlan() method).
+
+
+Action "list" lists submitted Stratosphere programs.
+
+  "list" action arguments:
+
+     -r,--running                   Show running programs and their JobIDs
+
+     -s,--scheduled                 Show scheduled programs and their JobIDs
+
+     -m,--jobmanager <host:port>    Option to connect to a different Stratosphere master (JobManager).
+
+
+Action "cancel" cancels a submitted Stratosphere program.
+
+  "cancel" action arguments:
+
+     -i,--jobid <jobID>             JobID of program to cancel
+     
+     -m,--jobmanager <host:port>    Option to connect to a different Stratosphere master (JobManager).
+```

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/cluster_execution.md
----------------------------------------------------------------------
diff --git a/docs/cluster_execution.md b/docs/cluster_execution.md
new file mode 100644
index 0000000..a41bc0f
--- /dev/null
+++ b/docs/cluster_execution.md
@@ -0,0 +1,125 @@
+---
+title:  "Cluster Execution"
+---
+
+Stratosphere programs can run distributed on clusters of many machines. There
+are two ways to send a program to a cluster for execution:
+
+# Command Line Interface
+
+The command line interface lets you submit packaged programs (JARs) to a cluster
+(or single machine setup).
+
+Please refer to the [Command Line Interface](cli.html) documentation for
+details.
+
+# Remote Environment
+
+The remote environment lets you execute Stratosphere Java programs on a cluster
+directly. The remote environment points to the cluster on which you want to
+execute the program.
+
+## Maven Dependency
+
+If you are developing your program as a Maven project, you have to add the
+`stratosphere-clients` module using this dependency:
+
+```xml
+<dependency>
+  <groupId>eu.stratosphere</groupId>
+  <artifactId>stratosphere-clients</artifactId>
+  <version>{{ site.FLINK_VERSION }}</version>
+</dependency>
+```
+
+## Example
+
+The following illustrates the use of the `RemoteEnvironment`:
+
+```java
+public static void main(String[] args) throws Exception {
+    ExecutionEnvironment env = ExecutionEnvironment
+        .createRemoteEnvironment("strato-master", "7661", "/home/user/udfs.jar");
+
+    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
+
+    data
+        .filter(new FilterFunction<String>() {
+            public boolean filter(String value) {
+                return value.startsWith("http://");
+            }
+        })
+        .writeAsText("hdfs://path/to/result");
+
+    env.execute();
+}
+```
+
+Note that the program contains custom UDFs and hence requires a JAR file with
+the classes of the code attached. The constructor of the remote environment
+takes the path(s) to the JAR file(s).
+
+# Remote Executor
+
+Similar to the RemoteEnvironment, the RemoteExecutor lets you execute
+Stratosphere programs on a cluster directly. The remote executor accepts a
+*Plan* object, which describes the program as a single executable unit.
+
+## Maven Dependency
+
+If you are developing your program in a Maven project, you have to add the
+`stratosphere-clients` module using this dependency:
+
+```xml
+<dependency>
+  <groupId>eu.stratosphere</groupId>
+  <artifactId>stratosphere-clients</artifactId>
+  <version>{{ site.FLINK_VERSION }}</version>
+</dependency>
+```
+
+## Example
+
+The following illustrates the use of the `RemoteExecutor` with the Scala API:
+
+```scala
+def main(args: Array[String]) {
+    val input = TextFile("hdfs://path/to/file")
+
+    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } }
+    val counts = words groupBy { x => x } count()
+
+    val output = counts.write(wordsOutput, CsvOutputFormat())
+  
+    val plan = new ScalaPlan(Seq(output), "Word Count")
+    val executor = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar")
+    executor.executePlan(p);
+}
+```
+
+The following illustrates the use of the `RemoteExecutor` with the Java API (as
+an alternative to the RemoteEnvironment):
+
+```java
+public static void main(String[] args) throws Exception {
+    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
+
+    data
+        .filter(new FilterFunction<String>() {
+            public boolean filter(String value) {
+                return value.startsWith("http://");
+            }
+        })
+        .writeAsText("hdfs://path/to/result");
+
+    Plan p = env.createProgramPlan();
+    RemoteExecutor e = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar");
+    e.executePlan(p);
+}
+```
+
+Note that the program contains custom UDFs and hence requires a JAR file with
+the classes of the code attached. The constructor of the remote executor takes
+the path(s) to the JAR file(s).

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/cluster_setup.md
----------------------------------------------------------------------
diff --git a/docs/cluster_setup.md b/docs/cluster_setup.md
new file mode 100644
index 0000000..3692882
--- /dev/null
+++ b/docs/cluster_setup.md
@@ -0,0 +1,363 @@
+---
+title:  "Cluster Setup"
+---
+
+This documentation is intended to provide instructions on how to run
+Stratosphere in a fully distributed fashion on a static (but possibly
+heterogeneous) cluster.
+
+This involves two steps. First, installing and configuring Stratosphere and
+second installing and configuring the [Hadoop Distributed
+Filesystem](http://hadoop.apache.org/) (HDFS).
+
+# Preparing the Cluster
+
+## Software Requirements
+
+Stratosphere runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**,
+and **Cygwin** (for Windows) and expects the cluster to consist of **one master
+node** and **one or more worker nodes**. Before you start to setup the system,
+make sure you have the following software installed **on each node**:
+
+- **Java 1.6.x** or higher,
+- **ssh** (sshd must be running to use the Stratosphere scripts that manage
+  remote components)
+
+If your cluster does not fulfill these software requirements you will need to
+install/upgrade it.
+
+For example, on Ubuntu Linux, type in the following commands to install Java and
+ssh:
+
+```
+sudo apt-get install ssh 
+sudo apt-get install openjdk-7-jre
+```
+
+You can check the correct installation of Java by issuing the following command:
+
+```
+java -version
+```
+
+The command should output something comparable to the following on every node of
+your cluster (depending on your Java version, there may be small differences):
+
+```
+java version "1.6.0_22"
+Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
+Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
+```
+
+To make sure the ssh daemon is running properly, you can use the command
+
+```
+ps aux | grep sshd
+```
+
+Something comparable to the following line should appear in the output
+of the command on every host of your cluster:
+
+```
+root       894  0.0  0.0  49260   320 ?        Ss   Jan09   0:13 /usr/sbin/sshd
+```
+
+## Configuring Remote Access with ssh
+
+In order to start/stop the remote processes, the master node requires access via
+ssh to the worker nodes. It is most convenient to use ssh's public key
+authentication for this. To setup public key authentication, log on to the
+master as the user who will later execute all the Stratosphere components. **The
+same user (i.e. a user with the same user name) must also exist on all worker
+nodes**. For the remainder of this instruction we will refer to this user as
+*stratosphere*. Using the super user *root* is highly discouraged for security
+reasons.
+
+Once you logged in to the master node as the desired user, you must generate a
+new public/private key pair. The following command will create a new
+public/private key pair into the *.ssh* directory inside the home directory of
+the user *stratosphere*. See the ssh-keygen man page for more details. Note that
+the private key is not protected by a passphrase.
+
+```
+ssh-keygen -b 2048 -P '' -f ~/.ssh/id_rsa
+```
+
+Next, copy/append the content of the file *.ssh/id_rsa.pub* to your
+authorized_keys file. The content of the authorized_keys file defines which
+public keys are considered trustworthy during the public key authentication
+process. On most systems the appropriate command is
+
+```
+cat .ssh/id_rsa.pub >> .ssh/authorized_keys
+```
+
+On some Linux systems, the authorized keys file may also be expected by the ssh
+daemon under *.ssh/authorized_keys2*. In either case, you should make sure the
+file only contains those public keys which you consider trustworthy for each
+node of cluster.
+
+Finally, the authorized keys file must be copied to every worker node of your
+cluster. You can do this by repeatedly typing in
+
+```
+scp .ssh/authorized_keys <worker>:~/.ssh/
+```
+
+and replacing *\<worker\>* with the host name of the respective worker node.
+After having finished the copy process, you should be able to log on to each
+worker node from your master node via ssh without a password.
+
+## Setting JAVA_HOME on each Node
+
+Stratosphere requires the `JAVA_HOME` environment variable to be set on the
+master and all worker nodes and point to the directory of your Java
+installation.
+
+You can set this variable in `conf/stratosphere-conf.yaml` via the
+`env.java.home` key.
+
+Alternatively, add the following line to your shell profile. If you use the
+*bash* shell (probably the most common shell), the shell profile is located in
+*\~/.bashrc*:
+
+```
+export JAVA_HOME=/path/to/java_home/
+```
+
+If your ssh daemon supports user environments, you can also add `JAVA_HOME` to
+*.\~/.ssh/environment*. As super user *root* you can enable ssh user
+environments with the following commands:
+
+```
+echo "PermitUserEnvironment yes" >> /etc/ssh/sshd_config
+/etc/init.d/ssh restart
+```
+
+# Hadoop Distributed Filesystem (HDFS) Setup
+
+The Stratosphere system currently uses the Hadoop Distributed Filesystem (HDFS)
+to read and write data in a distributed fashion.
+
+Make sure to have a running HDFS installation. The following instructions are
+just a general overview of some required settings. Please consult one of the
+many installation guides available online for more detailed instructions.
+
+**Note that the following instructions are based on Hadoop 1.2 and might differ
+**for Hadoop 2.
+
+## Downloading, Installing, and Configuring HDFS
+
+Similar to the Stratosphere system HDFS runs in a distributed fashion. HDFS
+consists of a **NameNode** which manages the distributed file system's meta
+data. The actual data is stored by one or more **DataNodes**. For the remainder
+of this instruction we assume the HDFS's NameNode component runs on the master
+node while all the worker nodes run an HDFS DataNode.
+
+To start, log on to your master node and download Hadoop (which includes  HDFS)
+from the Apache [Hadoop Releases](http://hadoop.apache.org/releases.html) page.
+
+Next, extract the Hadoop archive.
+
+After having extracted the Hadoop archive, change into the Hadoop directory and
+edit the Hadoop environment configuration file:
+
+```
+cd hadoop-*
+vi conf/hadoop-env.sh
+```
+
+Uncomment and modify the following line in the file according to the path of
+your Java installation.
+
+``` export JAVA_HOME=/path/to/java_home/ ```
+
+Save the changes and open the HDFS configuration file *conf/hdfs-site.xml*. HDFS
+offers multiple configuration parameters which affect the behavior of the
+distributed file system in various ways. The following excerpt shows a minimal
+configuration which is required to make HDFS work. More information on how to
+configure HDFS can be found in the [HDFS User
+Guide](http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) guide.
+
+```xml
+<configuration>
+  <property>
+    <name>fs.default.name</name>
+    <value>hdfs://MASTER:50040/</value>
+  </property>
+  <property>
+    <name>dfs.data.dir</name>
+    <value>DATAPATH</value>
+  </property>
+</configuration>
+```
+
+Replace *MASTER* with the IP/host name of your master node which runs the
+*NameNode*. *DATAPATH* must be replaced with path to the directory in which the
+actual HDFS data shall be stored on each worker node. Make sure that the
+*stratosphere* user has sufficient permissions to read and write in that
+directory.
+
+After having saved the HDFS configuration file, open the file *conf/slaves* and
+enter the IP/host name of those worker nodes which shall act as *DataNode*s.
+Each entry must be separated by a line break.
+
+```
+<worker 1>
+<worker 2>
+.
+.
+.
+<worker n>
+```
+
+Initialize the HDFS by typing in the following command. Note that the
+command will **delete all data** which has been previously stored in the
+HDFS. However, since we have just installed a fresh HDFS, it should be
+safe to answer the confirmation with *yes*.
+
+```
+bin/hadoop namenode -format
+```
+
+Finally, we need to make sure that the Hadoop directory is available to
+all worker nodes which are intended to act as DataNodes and that all nodes
+**find the directory under the same path**. We recommend to use a shared network
+directory (e.g. an NFS share) for that. Alternatively, one can copy the
+directory to all nodes (with the disadvantage that all configuration and
+code updates need to be synced to all nodes).
+
+## Starting HDFS
+
+To start the HDFS log on to the master and type in the following
+commands
+
+```
+cd hadoop-*
+binn/start-dfs.sh
+```
+
+If your HDFS setup is correct, you should be able to open the HDFS
+status website at *http://MASTER:50070*. In a matter of a seconds,
+all DataNodes should appear as live nodes. For troubleshooting we would
+like to point you to the [Hadoop Quick
+Start](http://wiki.apache.org/hadoop/QuickStart)
+guide.
+
+# Stratosphere Setup
+
+Go to the [downloads page]({{site.baseurl}}/downloads/) and get the ready to run
+package. Make sure to pick the Stratosphere package **matching your Hadoop
+version**.
+
+After downloading the latest release, copy the archive to your master node and
+extract it:
+
+```
+tar xzf stratosphere-*.tgz
+cd stratosphere-*
+```
+
+## Configuring the Cluster
+
+After having extracted the system files, you need to configure Stratosphere for
+the cluster by editing *conf/stratosphere-conf.yaml*.
+
+Set the `jobmanager.rpc.address` key to point to your master node. Furthermode
+define the maximum amount of main memory the JVM is allowed to allocate on each
+node by setting the `jobmanager.heap.mb` and `taskmanager.heap.mb` keys.
+
+The value is given in MB. If some worker nodes have more main memory which you
+want to allocate to the Stratosphere system you can overwrite the default value
+by setting an environment variable `STRATOSPHERE_TM_HEAP` on the respective
+node.
+
+Finally you must provide a list of all nodes in your cluster which shall be used
+as worker nodes. Therefore, similar to the HDFS configuration, edit the file
+*conf/slaves* and enter the IP/host name of each worker node. Each worker node
+will later run a TaskManager.
+
+Each entry must be separated by a new line, as in the following example:
+
+```
+192.168.0.100
+192.168.0.101
+.
+.
+.
+192.168.0.150
+```
+
+The Stratosphere directory must be available on every worker under the same
+path. Similarly as for HDFS, you can use a shared NSF directory, or copy the
+entire Stratosphere directory to every worker node.
+
+## Configuring the Network Buffers
+
+Network buffers are a critical resource for the communication layers. They are
+used to buffer records before transmission over a network, and to buffer
+incoming data before dissecting it into records and handing them to the
+application. A sufficient number of network buffers are critical to achieve a
+good throughput.
+
+In general, configure the task manager to have so many buffers that each logical
+network connection on you expect to be open at the same time has a dedicated
+buffer. A logical network connection exists for each point-to-point exchange of
+data over the network, which typically happens at repartitioning- or
+broadcasting steps. In those, each parallel task inside the TaskManager has to
+be able to talk to all other parallel tasks. Hence, the required number of
+buffers on a task manager is *total-degree-of-parallelism* (number of targets)
+\* *intra-node-parallelism* (number of sources in one task manager) \* *n*.
+Here, *n* is a constant that defines how many repartitioning-/broadcasting steps
+you expect to be active at the same time.
+
+Since the *intra-node-parallelism* is typically the number of cores, and more
+than 4 repartitioning or broadcasting channels are rarely active in parallel, it
+frequently boils down to *\#cores\^2\^* \* *\#machines* \* 4. To support for
+example a cluster of 20 8-core machines, you should use roughly 5000 network
+buffers for optimal throughput.
+
+Each network buffer is by default 64 KiBytes large. In the above example, the
+system would allocate roughly 300 MiBytes for network buffers.
+
+The number and size of network buffers can be configured with the following
+parameters:
+
+- `taskmanager.network.numberOfBuffers`, and
+- `taskmanager.network.bufferSizeInBytes`.
+
+## Configuring Temporary I/O Directories
+
+Although Stratosphere aims to process as much data in main memory as possible,
+it is not uncommon that  more data needs to be processed than memory is
+available. Stratosphere's runtime is designed to  write temporary data to disk
+to handle these situations.
+
+The `taskmanager.tmp.dirs` parameter specifies a list of directories into which
+Stratosphere writes temporary files. The paths of the directories need to be
+separated by ':' (colon character).  Stratosphere will concurrently write (or
+read) one temporary file to (from) each configured directory.  This way,
+temporary I/O can be evenly distributed over multiple independent I/O devices
+such as hard disks to improve performance.  To leverage fast I/O devices (e.g.,
+SSD, RAID, NAS), it is possible to specify a directory multiple times.
+
+If the `taskmanager.tmp.dirs` parameter is not explicitly specified,
+Stratosphere writes temporary data to the temporary  directory of the operating
+system, such as */tmp* in Linux systems.
+
+Please see the [configuration page](config.html) for details and additional
+configuration options.
+
+## Starting Stratosphere
+
+The following script starts a JobManager on the local node and connects via
+SSH to all worker nodes listed in the *slaves* file to start the
+TaskManager on each node. Now your Stratosphere system is up and
+running. The JobManager running on the local node will now accept jobs
+at the configured RPC port.
+
+Assuming that you are on the master node and inside the Stratosphere directory:
+
+```
+bin/start-cluster.sh
+```

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/config.md
----------------------------------------------------------------------
diff --git a/docs/config.md b/docs/config.md
new file mode 100644
index 0000000..c11cc18
--- /dev/null
+++ b/docs/config.md
@@ -0,0 +1,171 @@
+---
+title:  "Configuration"
+---
+
+# Overview
+
+This page provides an overview of possible settings for Stratosphere. All
+configuration is done in `conf/stratosphere-conf.yaml`, which is expected to be
+a flat collection of [YAML key value pairs](http://www.yaml.org/spec/1.2/spec.html)
+with format `key: value`.
+
+The system and run scripts parse the config at startup and override the
+respective default values with the given values for every that has been set.
+This page contains a reference for all configuration keys used in the system.
+
+# Common Options
+
+- `env.java.home`: The path to the Java installation to use (DEFAULT: system's
+default Java installation).
+- `jobmanager.rpc.address`: The IP address of the JobManager (DEFAULT:
+localhost).
+- `jobmanager.rpc.port`: The port number of the JobManager (DEFAULT: 6123).
+- `jobmanager.heap.mb`: JVM heap size (in megabytes) for the JobManager
+(DEFAULT: 256).
+- `taskmanager.heap.mb`: JVM heap size (in megabytes) for the TaskManager. In
+contrast to Hadoop, Stratosphere runs operators and functions inside the
+TaskManager (including sorting/hashing/caching), so this value should be as
+large as possible (DEFAULT: 512).
+- `taskmanager.tmp.dirs`: The directory for temporary files, or a list of
+directories separated by the systems directory delimiter (for example ':'
+(colon) on Linux/Unix). If multiple directories are specified then the temporary
+files will be distributed across the directories in a round robin fashion. The
+I/O manager component will spawn one reading and one writing thread per
+directory. A directory may be listed multiple times to have the I/O manager use
+multiple threads for it (for example if it is physically stored on a very fast
+disc or RAID) (DEFAULT: The system's tmp dir).
+- `parallelization.degree.default`: The default degree of parallelism to use for
+programs that have no degree of parallelism specified. A value of -1 indicates
+no limit, in which the degree of parallelism is set to the number of available
+instances at the time of compilation (DEFAULT: -1).
+- `parallelization.intra-node.default`: The number of parallel instances of an
+operation that are assigned to each TaskManager. A value of -1 indicates no
+limit (DEFAULT: -1).
+- `taskmanager.network.numberOfBuffers`: The number of buffers available to the
+network stack. This number determines how many streaming data exchange channels
+a TaskManager can have at the same time and how well buffered the channels are.
+If a job is rejected or you get a warning that the system has not enough buffers
+available, increase this value (DEFAULT: 2048).
+- `taskmanager.memory.size`: The amount of memory (in megabytes) that the task
+manager reserves for sorting, hash tables, and caching of intermediate results.
+If unspecified (-1), the memory manager will take a fixed ratio of the heap
+memory available to the JVM after the allocation of the network buffers (0.8)
+(DEFAULT: -1).
+- `jobmanager.profiling.enable`: Flag to enable job manager's profiling
+component. This collects network/cpu utilization statistics, which are displayed
+as charts in the SWT visualization GUI (DEFAULT: false).
+
+# HDFS
+
+These parameters configure the default HDFS used by Stratosphere. If you don't
+specify a HDFS configuration, you will have to specify the full path to your
+HDFS files like `hdfs://address:port/path/to/files` and filed with be written
+with default HDFS parameters (block size, replication factor).
+
+- `fs.hdfs.hadoopconf`: The absolute path to the Hadoop configuration directory.
+The system will look for the "core-site.xml" and "hdfs-site.xml" files in that
+directory (DEFAULT: null).
+- `fs.hdfs.hdfsdefault`: The absolute path of Hadoop's own configuration file
+"hdfs-default.xml" (DEFAULT: null).
+- `fs.hdfs.hdfssite`: The absolute path of Hadoop's own configuration file
+"hdfs-site.xml" (DEFAULT: null).
+
+# JobManager &amp; TaskManager
+
+The following parameters configure Stratosphere's JobManager, TaskManager, and
+runtime channel management.
+
+- `jobmanager.rpc.address`: The hostname or IP address of the JobManager
+(DEFAULT: localhost).
+- `jobmanager.rpc.port`: The port of the JobManager (DEFAULT: 6123).
+- `jobmanager.rpc.numhandler`: The number of RPC threads for the JobManager.
+Increase those for large setups in which many TaskManagers communicate with the
+JobManager simultaneousl (DEFAULT: 8).
+- `jobmanager.profiling.enable`: Flag to enable the profiling component. This
+collects network/cpu utilization statistics, which are displayed as charts in
+the SWT visualization GUI. The profiling may add a small overhead on the
+execution (DEFAULT: false).
+- `jobmanager.web.port`: Port of the JobManager's web interface (DEFAULT: 8081).
+- `jobmanager.heap.mb`: JVM heap size (in megabytes) for the JobManager
+(DEFAULT: 256).
+- `taskmanager.heap.mb`: JVM heap size (in megabytes) for the TaskManager. In
+contrast to Hadoop, Stratosphere runs operators and functions inside the
+TaskManager (including sorting/hashing/caching), so this value should be as
+large as possible (DEFAULT: 512).
+- `taskmanager.rpc.port`: The task manager's IPC port (DEFAULT: 6122).
+- `taskmanager.data.port`: The task manager's port used for data exchange
+operations (DEFAULT: 6121).
+- `taskmanager.tmp.dirs`: The directory for temporary files, or a list of
+directories separated by the systems directory delimiter (for example ':'
+(colon) on Linux/Unix). If multiple directories are specified then the temporary
+files will be distributed across the directories in a round robin fashion. The
+I/O manager component will spawn one reading and one writing thread per
+directory. A directory may be listed multiple times to have the I/O manager use
+multiple threads for it (for example if it is physically stored on a very fast
+disc or RAID) (DEFAULT: The system's tmp dir).
+- `taskmanager.network.numberOfBuffers`: The number of buffers available to the
+network stack. This number determines how many streaming data exchange channels
+a TaskManager can have at the same time and how well buffered the channels are.
+If a job is rejected or you get a warning that the system has not enough buffers
+available, increase this value (DEFAULT: 2048).
+- `taskmanager.network.bufferSizeInBytes`: The size of the network buffers, in
+bytes (DEFAULT: 32768 (= 32 KiBytes)).
+- `taskmanager.memory.size`: The amount of memory (in megabytes) that the task
+manager reserves for sorting, hash tables, and caching of intermediate results.
+If unspecified (-1), the memory manager will take a relative amount of the heap
+memory available to the JVM after the allocation of the network buffers (0.8)
+(DEFAULT: -1).
+- `taskmanager.memory.fraction`: The fraction of memory (after allocation of the
+network buffers) that the task manager reserves for sorting, hash tables, and
+caching of intermediate results. This value is only used if
+'taskmanager.memory.size' is unspecified (-1) (DEFAULT: 0.8).
+- `jobclient.polling.interval`: The interval (in seconds) in which the client
+polls the JobManager for the status of its job (DEFAULT: 2).
+- `taskmanager.runtime.max-fan`: The maximal fan-in for external merge joins and
+fan-out for spilling hash tables. Limits the numer of file handles per operator,
+but may cause intermediate merging/partitioning, if set too small (DEFAULT: 128).
+- `taskmanager.runtime.sort-spilling-threshold`: A sort operation starts spilling
+when this fraction of its memory budget is full (DEFAULT: 0.8).
+- `taskmanager.runtime.fs_timeout`: The maximal time (in milliseconds) that the
+system waits for a response from the filesystem. Note that for HDFS, this time
+may occasionally be rather long. A value of 0 indicates infinite waiting time
+(DEFAULT: 0).
+
+# JobManager Web Frontend
+
+- `jobmanager.web.port`: Port of the JobManager's web interface that displays
+status of running jobs and execution time breakdowns of finished jobs
+(DEFAULT: 8081).
+- `jobmanager.web.history`: The number of latest jobs that the JobManager's web
+front-end in its history (DEFAULT: 5).
+
+# Webclient
+
+These parameters configure the web interface that can be used to submit jobs and
+review the compiler's execution plans.
+
+- `webclient.port`: The port of the webclient server (DEFAULT: 8080).
+- `webclient.tempdir`: The temp directory for the web server. Used for example
+for caching file fragments during file-uploads (DEFAULT: The system's temp
+directory).
+- `webclient.uploaddir`: The directory into which the web server will store
+uploaded programs (DEFAULT: ${webclient.tempdir}/webclient-jobs/).
+- `webclient.plandump`: The directory into which the web server will dump
+temporary JSON files describing the execution plans
+(DEFAULT: ${webclient.tempdir}/webclient-plans/).
+
+# Compiler/Optimizer
+
+- `compiler.delimited-informat.max-line-samples`: The maximum number of line
+samples taken by the compiler for delimited inputs. The samples are used to
+estimate the number of records. This value can be overridden for a specific
+input with the input format's parameters (DEFAULT: 10).
+- `compiler.delimited-informat.min-line-samples`: The minimum number of line
+samples taken by the compiler for delimited inputs. The samples are used to
+estimate the number of records. This value can be overridden for a specific
+input with the input format's parameters (DEFAULT: 2).
+- `compiler.delimited-informat.max-sample-len`: The maximal length of a line
+sample that the compiler takes for delimited inputs. If the length of a single
+sample exceeds this value (possible because of misconfiguration of the parser),
+the sampling aborts. This value can be overridden for a specific input with the
+input format's parameters (DEFAULT: 2097152 (= 2 MiBytes)).

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/css/syntax.css
----------------------------------------------------------------------
diff --git a/docs/css/syntax.css b/docs/css/syntax.css
new file mode 100644
index 0000000..2774b76
--- /dev/null
+++ b/docs/css/syntax.css
@@ -0,0 +1,60 @@
+.highlight  { background: #ffffff; }
+.highlight .c { color: #999988; font-style: italic } /* Comment */
+.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
+.highlight .k { font-weight: bold } /* Keyword */
+.highlight .o { font-weight: bold } /* Operator */
+.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */
+.highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
+.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */
+.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
+.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
+.highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
+.highlight .ge { font-style: italic } /* Generic.Emph */
+.highlight .gr { color: #aa0000 } /* Generic.Error */
+.highlight .gh { color: #999999 } /* Generic.Heading */
+.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
+.highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
+.highlight .go { color: #888888 } /* Generic.Output */
+.highlight .gp { color: #555555 } /* Generic.Prompt */
+.highlight .gs { font-weight: bold } /* Generic.Strong */
+.highlight .gu { color: #aaaaaa } /* Generic.Subheading */
+.highlight .gt { color: #aa0000 } /* Generic.Traceback */
+.highlight .kc { font-weight: bold } /* Keyword.Constant */
+.highlight .kd { font-weight: bold } /* Keyword.Declaration */
+.highlight .kp { font-weight: bold } /* Keyword.Pseudo */
+.highlight .kr { font-weight: bold } /* Keyword.Reserved */
+.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */
+.highlight .m { color: #009999 } /* Literal.Number */
+.highlight .s { color: #d14 } /* Literal.String */
+.highlight .na { color: #008080 } /* Name.Attribute */
+.highlight .nb { color: #0086B3 } /* Name.Builtin */
+.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */
+.highlight .no { color: #008080 } /* Name.Constant */
+.highlight .ni { color: #800080 } /* Name.Entity */
+.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */
+.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */
+.highlight .nn { color: #555555 } /* Name.Namespace */
+.highlight .nt { color: #000080 } /* Name.Tag */
+.highlight .nv { color: #008080 } /* Name.Variable */
+.highlight .ow { font-weight: bold } /* Operator.Word */
+.highlight .w { color: #bbbbbb } /* Text.Whitespace */
+.highlight .mf { color: #009999 } /* Literal.Number.Float */
+.highlight .mh { color: #009999 } /* Literal.Number.Hex */
+.highlight .mi { color: #009999 } /* Literal.Number.Integer */
+.highlight .mo { color: #009999 } /* Literal.Number.Oct */
+.highlight .sb { color: #d14 } /* Literal.String.Backtick */
+.highlight .sc { color: #d14 } /* Literal.String.Char */
+.highlight .sd { color: #d14 } /* Literal.String.Doc */
+.highlight .s2 { color: #d14 } /* Literal.String.Double */
+.highlight .se { color: #d14 } /* Literal.String.Escape */
+.highlight .sh { color: #d14 } /* Literal.String.Heredoc */
+.highlight .si { color: #d14 } /* Literal.String.Interpol */
+.highlight .sx { color: #d14 } /* Literal.String.Other */
+.highlight .sr { color: #009926 } /* Literal.String.Regex */
+.highlight .s1 { color: #d14 } /* Literal.String.Single */
+.highlight .ss { color: #990073 } /* Literal.String.Symbol */
+.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */
+.highlight .vc { color: #008080 } /* Name.Variable.Class */
+.highlight .vg { color: #008080 } /* Name.Variable.Global */
+.highlight .vi { color: #008080 } /* Name.Variable.Instance */
+.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/faq.md
----------------------------------------------------------------------
diff --git a/docs/faq.md b/docs/faq.md
new file mode 100644
index 0000000..3ceb527
--- /dev/null
+++ b/docs/faq.md
@@ -0,0 +1,285 @@
+---
+title: "Frequently Asked Questions (FAQ)"
+---
+
+# General
+
+## Is Stratosphere a Hadoop Project?
+
+Stratosphere is a data processing system and an alternative to Hadoop's
+MapReduce component. It comes with its own runtime, rather than building on top
+of MapReduce. As such, it can work completely independently of the Hadoop
+ecosystem. However, Stratosphere can also access Hadoop's distributed file
+system (HDFS) to read and write data, and Hadoop's next-generation resource
+manager (YARN) to provision cluster resources. Since most Stratosphere users are
+using Hadoop HDFS to store their data, we ship already the required libraries to
+access HDFS.
+
+## Do I have to install Apache Hadoop to use Stratosphere?
+
+No. Stratosphere can run without a Hadoop installation. However, a very common
+setup is to use Stratosphere to analyze data stored in the Hadoop Distributed
+File System (HDFS). To make these setups work out of the box, we bundle the
+Hadoop client libraries with Stratosphere by default.
+
+Additionally, we provide a special YARN Enabled download of Stratosphere for
+users with an existing Hadoop YARN cluster. [Apache Hadoop
+YARN](http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-
+site/YARN.html) is Hadoop's cluster resource manager that allows to use
+different execution engines next to each other on a cluster.
+
+# Usage
+
+## How do I assess the progress of a Stratosphere program?
+
+There are a multiple of ways to track the progress of a Stratosphere program:
+
+- The JobManager (the master of the distributed system) starts a web interface
+to observe program execution. In runs on port 8081 by default (configured in
+`conf/stratosphere-config.yml`).
+- When you start a program from the command line, it will print the status
+changes of all operators as the program progresses through the operations.
+- All status changes are also logged to the JobManager's log file.
+
+## How can I figure out why a program failed?
+
+- Thw JobManager web frontend (by default on port 8081) displays the exceptions
+of failed tasks.
+- If you run the program from the command-line, task exceptions are printed to
+the standard error stream and shown on the console.
+- Both the command line and the web interface allow you to figure out which
+parallel task first failed and caused the other tasks to cancel the execution.
+- Failing tasks and the corresponding exceptions are reported in the log files
+of the master and the worker where the exception occurred
+(`log/stratosphere-<user>-jobmanager-<host>.log` and
+`log/stratosphere-<user>-taskmanager-<host>.log`).
+
+## How do I debug Stratosphere programs?
+
+- When you start a program locally with the [LocalExecutor](local_execution.html),
+you can place breakpoints in your functions and debug them like normal
+Java/Scala programs.
+- The [Accumulators](java_api_guide.html#accumulators) are very helpful in
+tracking the behavior of the parallel execution. They allow you to gather
+information inside the program's operations and show them after the program
+execution.
+
+# Errors
+
+## I get an error message saying that not enough buffers are available. How do I fix this?
+
+If you run Stratosphere in a massively parallel setting (100+ parallel threads),
+you need to adapt the number of network buffers via the config parameter
+`taskmanager.network.numberOfBuffers`.
+As a rule-of-thumb, the number of buffers should be at least
+`4 * numberOfNodes * numberOfTasksPerNode^2`. See
+[Configuration Reference](config.html) for details.
+
+## My job fails early with a java.io.EOFException. What could be the cause?
+
+Note: In version <em>0.4</em>, the delta iterations limit the solution set to
+records with fixed-length data types. We will  in the next version.
+
+The most common case for these exception is when Stratosphere is set up with the
+wrong HDFS version. Because different HDFS versions are often not compatible
+with each other, the connection between the filesystem master and the client
+breaks.
+
+```bash
+Call to <host:port> failed on local exception: java.io.EOFException
+    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
+    at org.apache.hadoop.ipc.Client.call(Client.java:743)
+    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
+    at $Proxy0.getProtocolVersion(Unknown Source)
+    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
+    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
+    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
+    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
+    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
+    at eu.stratosphere.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:276
+```
+
+Please refer to the [download page](http://stratosphere.eu/downloads/#maven) and
+the [build instructions](https://github.com/stratosphere/stratosphere/blob/master/README.md)
+for details on how to set up Stratosphere for different Hadoop and HDFS versions.
+
+## My program does not compute the correct result. Why are my custom key types
+are not grouped/joined correctly?
+
+Keys must correctly implement the methods `java.lang.Object#hashCode()`,
+`java.lang.Object#equals(Object o)`, and `java.util.Comparable#compareTo(...)`.
+These methods are always backed with default implementations which are usually
+inadequate. Therefore, all keys must override `hashCode()` and `equals(Object o)`.
+
+## I get a java.lang.InstantiationException for my data type, what is wrong?
+
+All data type classes must be public and have a public nullary constructor
+(constructor with no arguments). Further more, the classes must not be abstract
+or interfaces. If the classes are internal classes, they must be public and
+static.
+
+## I can't stop Stratosphere with the provided stop-scripts. What can I do?
+
+Stopping the processes sometimes takes a few seconds, because the shutdown may
+do some cleanup work.
+
+In some error cases it happens that the JobManager or TaskManager cannot be
+stopped with the provided stop-scripts (`bin/stop-local.sh` or `bin/stop-
+cluster.sh`). You can kill their processes on Linux/Mac as follows:
+
+- Determine the process id (pid) of the JobManager / TaskManager process. You
+can use the `jps` command on Linux(if you have OpenJDK installed) or command
+`ps -ef | grep java` to find all Java processes. 
+- Kill the process with `kill -9 <pid>`, where `pid` is the process id of the
+affected JobManager or TaskManager process.
+    
+On Windows, the TaskManager shows a table of all processes and allows you to
+destroy a process by right its entry.
+
+## I got an OutOfMemoryException. What can I do?
+
+These exceptions occur usually when the functions in the program consume a lot
+of memory by collection large numbers of objects, for example in lists or maps.
+The OutOfMemoryExceptions in Java are kind of tricky. The exception is not
+necessarily thrown by the component that allocated most of the memory but by the
+component that tried to requested the latest bit of memory that could not be
+provided.
+
+There are two ways to go about this:
+
+1. See whether you can use less memory inside the functions. For example, use
+arrays of primitive types instead of object types.
+
+2. Reduce the memory that Stratosphere reserves for its own processing. The
+TaskManager reserves a certain portion of the available memory for sorting,
+hashing, caching, network buffering, etc. That part of the memory is unavailable
+to the user-defined functions. By reserving it, the system can guarantee to not
+run out of memory on large inputs, but to plan with the available memory and
+destage operations to disk, if necessary. By default, the system reserves around
+70% of the memory. If you frequently run applications that need more memory in
+the user-defined functions, you can reduce that value using the configuration
+entries `taskmanager.memory.fraction` or `taskmanager.memory.size`. See the
+[Configuration Reference](http://stratosphere.eu/docs/0.4/setup/config.html
+"Configuration Reference") for details. This will leave more memory to JVM heap,
+but may cause data processing tasks to go to disk more often.
+
+## Why do the TaskManager log files become so huge?
+
+Check the logging behavior of your jobs. Emitting logging per or tuple may be
+helpful to debug jobs in small setups with tiny data sets, it becomes very
+inefficient and disk space consuming if used for large input data.
+
+# YARN Deployment
+
+## The YARN session runs only for a few seconds
+
+The `./bin/yarn-session.sh` script is intended to run while the YARN-session is
+open. In some error cases however, the script immediately stops running. The
+output looks like this:
+
+```
+07:34:27,004 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1395604279745_273123 to ResourceManager at jobtracker-host
+Stratosphere JobManager is now running on worker1:6123
+JobManager Web Interface: http://jobtracker-host:54311/proxy/application_1295604279745_273123/
+07:34:51,528 INFO  eu.stratosphere.yarn.Client                                   - Application application_1295604279745_273123 finished with state FINISHED at 1398152089553
+07:34:51,529 INFO  eu.stratosphere.yarn.Client                                   - Killing the Stratosphere-YARN application.
+07:34:51,529 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Killing application application_1295604279745_273123
+07:34:51,534 INFO  eu.stratosphere.yarn.Client                                   - Deleting files in hdfs://user/marcus/.stratosphere/application_1295604279745_273123
+07:34:51,559 INFO  eu.stratosphere.yarn.Client                                   - YARN Client is shutting down
+```
+
+The problem here is that the Application Master (AM) is stopping and the YARN client assumes that the application has finished.
+
+There are three possible reasons for that behavior:
+
+- The ApplicationMaster exited with an exception. To debug that error, have a
+look in the logfiles of the container. The `yarn-site.xml` file contains the
+configured path. The key for the path is `yarn.nodemanager.log-dirs`, the
+default value is `${yarn.log.dir}/userlogs`.
+
+- YARN has killed the container that runs the ApplicationMaster. This case
+happens when the AM used too much memory or other resources beyond YARN's
+limits. In this case, you'll find error messages in the nodemanager logs on
+the host.
+
+- The operating system has shut down the JVM of the AM. This can happen if the
+YARN configuration is wrong and more memory than physically available is
+configured. Execute `dmesg` on the machine where the AM was running to see if
+this happened. You see messages from Linux' [OOM killer](http://linux-mm.org/OOM_Killer).
+
+## The YARN session crashes with a HDFS permission exception during startup
+
+While starting the YARN session, you are receiving an exception like this:
+
+```
+Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=robert, access=WRITE, inode="/user/robert":hdfs:supergroup:drwxr-xr-x
+  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
+  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
+  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5193)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5175)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5149)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2090)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2043)
+  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1996)
+  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:491)
+  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:301)
+  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59570)
+  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
+  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
+  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
+  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
+  at java.security.AccessController.doPrivileged(Native Method)
+  at javax.security.auth.Subject.doAs(Subject.java:396)
+  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
+  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
+
+  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
+  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
+  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
+  at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
+  at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
+  at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
+  at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1393)
+  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1382)
+  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1307)
+  at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:384)
+  at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:380)
+  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
+  at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:380)
+  at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:324)
+  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
+  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
+  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:783)
+  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
+  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
+  at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2021)
+  at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1989)
+  at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1954)
+  at eu.stratosphere.yarn.Utils.setupLocalResource(Utils.java:176)
+  at eu.stratosphere.yarn.Client.run(Client.java:362)
+  at eu.stratosphere.yarn.Client.main(Client.java:568)
+```
+
+The reason for this error is, that the home directory of the user **in HDFS**
+has the wrong permissions. The user (in this case `robert`) can not create
+directories in his own home directory.
+
+Stratosphere creates a `.stratosphere/` directory in the users home directory
+where it stores the Stratosphere jar and configuration file.
+
+# Features
+
+## What kind of fault-tolerance does Stratosphere provide?
+
+Stratospere can restart failed jobs. Mid-query fault tolerance will go into the
+open source project in the next versions.
+
+## Are Hadoop-like utilities, such as Counters and the DistributedCache supported?
+
+[Stratosphere's Accumulators](java_api_guide.html) work very similar like
+[Hadoop's counters, but are more powerful.
+
+Stratosphere has a [Distributed Cache](https://github.com/stratosphere/stratosphere/blob/{{ site.docs_05_stable_gh_tag }}/stratosphere-core/src/main/java/eu/stratosphere/api/common/cache/DistributedCache.java) that is deeply integrated with the APIs. Please refer to the [JavaDocs](https://github.com/stratosphere/stratosphere/blob/{{ site.docs_05_stable_gh_tag }}/stratosphere-java/src/main/java/eu/stratosphere/api/java/ExecutionEnvironment.java#L561) for details on how to use it.
+
+In order to make data sets available on all tasks, we encourage you to use [Broadcast Variables]({{site.baseurl}}/docs/0.5/programming_guides/java.html#broadcast_variables) instead. They are more efficient and easier to use than the distributed cache.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/40b94f73/docs/hadoop_compatability.md
----------------------------------------------------------------------
diff --git a/docs/hadoop_compatability.md b/docs/hadoop_compatability.md
new file mode 100644
index 0000000..06c0dfa
--- /dev/null
+++ b/docs/hadoop_compatability.md
@@ -0,0 +1,5 @@
+---
+title: "Hadoop Compatability"
+---
+
+To be written.
\ No newline at end of file


Mime
View raw message