flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kl0u <...@git.apache.org>
Subject [GitHub] flink pull request #5913: [FLINK-9181] [docs] [sql-client] Add documentation...
Date Fri, 27 Apr 2018 12:59:39 GMT
Github user kl0u commented on a diff in the pull request:

    --- Diff: docs/dev/table/sqlClient.md ---
    @@ -0,0 +1,538 @@
    +title: "SQL Client"
    +nav-parent_id: tableapi
    +nav-pos: 100
    +is_beta: true
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +  http://www.apache.org/licenses/LICENSE-2.0
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +Although Flinkā€™s Table & SQL API allows to declare queries in the SQL language.
A SQL query needs to be embedded within a table program that is written either in Java or
Scala. The table program needs to be packaged with a build tool before it can be submitted
to a cluster. This limits the usage of Flink to mostly Java/Scala programmers.
    +The *SQL Client* aims to provide an easy way of writing, debugging, and submitting table
programs to a Flink cluster without a single line of code. The *SQL Client CLI* allows for
retrieving and visualizing real-time results from the running distributed application on the
command line.
    +<a href="{{ site.baseurl }}/fig/sql_client_demo.gif"><img class="offset" src="{{
site.baseurl }}/fig/sql_client_demo.gif" alt="Animated demo of the Flink SQL Client CLI running
table programs on a cluster" width="80%" /></a>
    +**Note:** The SQL Client is in an early developement phase. Even though the application
is not production-ready yet, it can be a quite useful tool for prototyping and playing around
with Flink SQL. In the future, the community plans to extend its functionality by providing
a REST-based [SQL Client Gateway](sqlClient.html#limitations--future).
    +* This will be replaced by the TOC
    +Getting Started
    +This section describes how to setup and run your first Flink SQL program from the command-line.
The SQL Client is bundled in the regular Flink distribution and thus runnable out of the box.
    +The SQL Client requires a running Flink cluster where table programs can be submitted
to. For more information about setting up a Flink cluster see the [deployment part of this
documentation]({{ site.baseurl }}/ops/deployment/cluster_setup.html). If you simply want to
try out the SQL Client, you can also start a local cluster with one worker using the following
    +{% highlight bash %}
    +{% endhighlight %}
    +### Starting the SQL Client CLI
    +The SQL Client scripts are also located in the binary directory of Flink. You can start
the CLI by calling:
    +{% highlight bash %}
    +./bin/sql-client.sh embedded
    +{% endhighlight %}
    +This command starts the submission service and CLI embedded in one application process.
By default, the SQL Client will read its configuration from the environment file located in
`./conf/sql-client-defaults.yaml`. See the [next part](sqlClient.html#environment-files) for
more information about the structure of environment files.
    +### Running SQL Queries
    +Once the CLI has been started, you can use the `HELP` command to list all available SQL
statements. For validating your setup and cluster connection, you can enter your first SQL
query and press the `Enter` key to execute it:
    +{% highlight sql %}
    +SELECT 'Hello World'
    +{% endhighlight %}
    +This query requires no table source and produces a single row result. The CLI will retrieve
results from the cluster and visualize them. You can close the result view by pressing the
`Q` key.
    +The CLI supports **two modes** for maintaining and visualizing results.
    +The *table mode* materializes results in memory and visualizes them in a regular, paginated
table representation. It can be enabled by executing the following command in the CLI:
    +{% highlight text %}
    +SET execution.result-mode=table
    +{% endhighlight %}
    +The *changelog mode* does not materialize results and visualizes the result stream that
is produced by a continuous query [LINK] consisting of insertions (`+`) and retractions (`-`).
    +{% highlight text %}
    +SET execution.result-mode=changelog
    +{% endhighlight %}
    +You can use the following query to see both result modes in action:
    +{% highlight sql %}
    +SELECT name, COUNT(*) AS cnt FROM (VALUES ('Bob'), ('Alice'), ('Greg'), ('Bob')) AS NameTable(name)
GROUP BY name 
    +{% endhighlight %}
    +This query performs a bounded word count example. The following sections explain how
to read from table sources and configure other table program properties. 
    +{% top %}
    +The SQL Client can be started with the following optional CLI commands. They are discussed
in detail in the subsequent paragraphs.
    +{% highlight text %}
    +./bin/sql-client.sh embedded --help
    +Mode "embedded" submits Flink jobs from the local machine.
    +  Syntax: embedded [OPTIONS]
    +  "embedded" mode options:
    +     -d,--defaults <environment file>      The environment properties with which
    +                                           every new session is initialized.
    +                                           Properties might be overwritten by
    +                                           session properties.
    +     -e,--environment <environment file>   The environment properties to be
    +                                           imported into the session. It might
    +                                           overwrite default environment
    +                                           properties.
    +     -h,--help                             Show the help message with
    +                                           descriptions of all options.
    +     -j,--jar <JAR file>                   A JAR file to be imported into the
    +                                           session. The file might contain
    +                                           user-defined classes needed for the
    +                                           execution of statements such as
    +                                           functions, table sources, or sinks.
    +                                           Can be used multiple times.
    +     -l,--library <JAR directory>          A JAR file directory with which every
    +                                           new session is initialized. The files
    +                                           might contain user-defined classes
    +                                           needed for the execution of
    +                                           statements such as functions, table
    +                                           sources, or sinks. Can be used
    +                                           multiple times.
    +     -s,--session <session identifier>     The identifier for a session.
    +                                           'default' is the default identifier.
    +{% endhighlight %}
    +{% top %}
    +### Environment Files
    +A SQL query needs a configuration environment in which it is executed. The so-called
*environment files* define available table sources and sinks, external catalogs, user-defined
functions, and other properties required for execution and deployment.
    --- End diff --
    There is typo in the link "`http://http://yaml.org/`". Also I would go for sth like:
    Every environment file is a regular [YAML file](http://yaml.org/). An example of such
file is presented below. This configuration:
     - defines an environment with a table source `MyTableName` that reads from CSV file,
     - specifies a parallelism of 1 for queries executed in this environment, 
     - specifies an even-time characteristic, and
     - runs queries in the `table` result mode.
    Just because it is easy for the eye to focus on what the script specifies.


View raw message