flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From twalthr <...@git.apache.org>
Subject [GitHub] flink pull request #5913: [FLINK-9181] [docs] [sql-client] Add documentation...
Date Mon, 14 May 2018 10:23:47 GMT
Github user twalthr commented on a diff in the pull request:

    --- Diff: docs/dev/table/sqlClient.md ---
    @@ -0,0 +1,539 @@
    +title: "SQL Client"
    +nav-parent_id: tableapi
    +nav-pos: 100
    +is_beta: true
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +  http://www.apache.org/licenses/LICENSE-2.0
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +Although Flinkā€™s Table & SQL API allows to declare queries in the SQL language.
A SQL query needs to be embedded within a table program that is written either in Java or
Scala. The table program needs to be packaged with a build tool before it can be submitted
to a cluster. This limits the usage of Flink to mostly Java/Scala programmers.
    +The *SQL Client* aims to provide an easy way of writing, debugging, and submitting table
programs to a Flink cluster without a single line of code. The *SQL Client CLI* allows for
retrieving and visualizing real-time results from the running distributed application on the
command line.
    +<a href="{{ site.baseurl }}/fig/sql_client_demo.gif"><img class="offset" src="{{
site.baseurl }}/fig/sql_client_demo.gif" alt="Animated demo of the Flink SQL Client CLI running
table programs on a cluster" width="80%" /></a>
    +**Note:** The SQL Client is in an early developement phase. Even though the application
is not production-ready yet, it can be a quite useful tool for prototyping and playing around
with Flink SQL. In the future, the community plans to extend its functionality by providing
a REST-based [SQL Client Gateway](sqlClient.html#limitations--future).
    +* This will be replaced by the TOC
    +Getting Started
    +This section describes how to setup and run your first Flink SQL program from the command-line.
The SQL Client is bundled in the regular Flink distribution and thus runnable out of the box.
    +The SQL Client requires a running Flink cluster where table programs can be submitted
to. For more information about setting up a Flink cluster see the [deployment part of this
documentation]({{ site.baseurl }}/ops/deployment/cluster_setup.html). If you simply want to
try out the SQL Client, you can also start a local cluster with one worker using the following
    +{% highlight bash %}
    +{% endhighlight %}
    +### Starting the SQL Client CLI
    +The SQL Client scripts are also located in the binary directory of Flink. You can start
the CLI by calling:
    +{% highlight bash %}
    +./bin/sql-client.sh embedded
    +{% endhighlight %}
    +This command starts the submission service and CLI embedded in one application process.
By default, the SQL Client will read its configuration from the environment file located in
`./conf/sql-client-defaults.yaml`. See the [next part](sqlClient.html#environment-files) for
more information about the structure of environment files.
    +### Running SQL Queries
    +Once the CLI has been started, you can use the `HELP` command to list all available SQL
statements. For validating your setup and cluster connection, you can enter your first SQL
query and press the `Enter` key to execute it:
    +{% highlight sql %}
    +SELECT 'Hello World'
    +{% endhighlight %}
    +This query requires no table source and produces a single row result. The CLI will retrieve
results from the cluster and visualize them. You can close the result view by pressing the
`Q` key.
    +The CLI supports **two modes** for maintaining and visualizing results.
    +The *table mode* materializes results in memory and visualizes them in a regular, paginated
table representation. It can be enabled by executing the following command in the CLI:
    +{% highlight text %}
    +SET execution.result-mode=table
    +{% endhighlight %}
    +The *changelog mode* does not materialize results and visualizes the result stream that
is produced by a continuous query [LINK] consisting of insertions (`+`) and retractions (`-`).
    +{% highlight text %}
    +SET execution.result-mode=changelog
    +{% endhighlight %}
    +You can use the following query to see both result modes in action:
    +{% highlight sql %}
    +SELECT name, COUNT(*) AS cnt FROM (VALUES ('Bob'), ('Alice'), ('Greg'), ('Bob')) AS NameTable(name)
GROUP BY name 
    +{% endhighlight %}
    +This query performs a bounded word count example. The following sections explain how
to read from table sources and configure other table program properties. 
    +{% top %}
    +The SQL Client can be started with the following optional CLI commands. They are discussed
in detail in the subsequent paragraphs.
    +{% highlight text %}
    +./bin/sql-client.sh embedded --help
    +Mode "embedded" submits Flink jobs from the local machine.
    +  Syntax: embedded [OPTIONS]
    +  "embedded" mode options:
    +     -d,--defaults <environment file>      The environment properties with which
    +                                           every new session is initialized.
    +                                           Properties might be overwritten by
    +                                           session properties.
    +     -e,--environment <environment file>   The environment properties to be
    +                                           imported into the session. It might
    +                                           overwrite default environment
    +                                           properties.
    +     -h,--help                             Show the help message with
    +                                           descriptions of all options.
    +     -j,--jar <JAR file>                   A JAR file to be imported into the
    +                                           session. The file might contain
    +                                           user-defined classes needed for the
    +                                           execution of statements such as
    +                                           functions, table sources, or sinks.
    +                                           Can be used multiple times.
    +     -l,--library <JAR directory>          A JAR file directory with which every
    +                                           new session is initialized. The files
    +                                           might contain user-defined classes
    +                                           needed for the execution of
    +                                           statements such as functions, table
    +                                           sources, or sinks. Can be used
    +                                           multiple times.
    +     -s,--session <session identifier>     The identifier for a session.
    +                                           'default' is the default identifier.
    +{% endhighlight %}
    +{% top %}
    +### Environment Files
    +A SQL query needs a configuration environment in which it is executed. The so-called
*environment files* define available table sources and sinks, external catalogs, user-defined
functions, and other properties required for execution and deployment.
    +Every environment file is a regular [YAML file](http://http://yaml.org/) that looks similar
to the following example. The file defines an environment with a table source `MyTableName`
that reads from CSV file. Queries that are executed in this environment (among others) will
have a parallelism of 1, an even-time characteristic, and will run in the `table` result mode.
    +{% highlight yaml %}
    +# Define table sources and sinks here.
    +  - name: MyTableName
    +    type: source
    +    schema:
    +      - name: MyField1
    +        type: INT
    +      - name: MyField2
    +        type: VARCHAR
    +    connector:
    +      type: filesystem
    +      path: "/path/to/something.csv"
    +    format:
    +      type: csv
    +      fields:
    +        - name: MyField1
    +          type: INT
    +        - name: MyField2
    +          type: VARCHAR
    +      line-delimiter: "\n"
    +      comment-prefix: "#"
    +# Execution properties allow for changing the behavior of a table program.
    +  type: streaming
    +  time-characteristic: event-time
    +  parallelism: 1
    +  max-parallelism: 16
    +  min-idle-state-retention: 0
    +  max-idle-state-retention: 0
    +  result-mode: table
    +# Deployment properties allow for describing the cluster to which table programs are
submitted to.
    +  response-timeout: 5000
    +{% endhighlight %}
    +Environment files can be created for general purposes (*defaults environment file* using
`--defaults`) as well as on a per-session basis (*session environment file* using `--environment`).
Every CLI session is initialized with the default properties followed by the session properties.
Both default and session environment files can be passed when starting the CLI application.
If no default environment file has been specified, the SQL Client searches for `./conf/sql-client-defaults.yaml`
in Flink's configuration directory.
    +Properties that have been set within a CLI session (e.g. using the `SET` command) have
highest precedence:
    +{% highlight text %}
    +CLI commands > session environment file > defaults environment file
    +{% endhighlight %}
    +{% top %}
    +### Dependencies
    +The SQL Client does not require to setup a Java project using Maven or SBT. Instead,
you can pass the dependencies as regular JAR files that get submitted to the cluster. You
can either specify each JAR file separately (using `--jar`) or define entire library directories
(using `--library`). For connectors to external systems (such as Apache Kafka) and corresponding
data formats (such as JSON), Flink provides **ready-to-use JAR bundles**. These JAR files
are suffixed with `sql-jar` and can be downloaded for each release from the Maven central
    +{% if site.is_stable %}
    --- End diff --
    I think the paragraph is pretty clear with `for each release`. 


View raw message