flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aljoscha <...@git.apache.org>
Subject [GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Date Fri, 01 Jul 2016 08:04:53 GMT
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2154#discussion_r69263379
  
    --- Diff: docs/apis/streaming/windows.md ---
    @@ -24,1023 +24,608 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream`
into finite
    +slices based on the timestamps of elements or other criteria. This division is required
when working
    +with infinite streams of data and performing transformations that aggregate elements.
    +
    +<span class="label label-info">Info</span> We will mostly talk about *keyed
windowing* here, i.e.
    +windows that are applied on a `KeyedStream`. Keyed windows have the advantage that elements
are
    +subdivided based on both window and key before being given to
    +a user function. The work can thus be distributed across the cluster
    +because the elements for different keys can be processed independently. If you absolutely
have to,
    +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed
    +windows work.
    +
     * This will be replaced by the TOC
     {:toc}
     
    -## Windows on Keyed Data Streams
    -
    -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these
group elements *per key*,
    -i.e., each window will contain elements with the same key value.
    +## Basics
     
    -### Basic Window Constructs
    +For a windowed transformations you must at least specify a *key*
    +(see [specifying keys](apis/common/index.html#specifying-keys))
    +a *window assigner* and a *window function*. The *key* divides the infinite, non-keyed,
stream
    +into logical keyed streams while the *window assigner* assigns elements to finite per-key
windows.
    +Finally, the *window function* is used to process the elements of each window.
     
    -Flink offers a general window mechanism that provides flexibility, as well as a number
of pre-defined windows
    -for common use cases. See first if your use case can be served by the pre-defined windows
below before moving
    -to defining your own windows.
    +The basic structure of a windowed transformation is thus as follows:
     
     <div class="codetabs" markdown="1">
     <div data-lang="java" markdown="1">
    +{% highlight java %}
    +DataStream<T> input = ...;
     
    -<br />
    -
    -<table class="table table-bordered">
    -  <thead>
    -    <tr>
    -      <th class="text-left" style="width: 25%">Transformation</th>
    -      <th class="text-center">Description</th>
    -    </tr>
    -  </thead>
    -  <tbody>
    -      <tr>
    -        <td><strong>Tumbling time window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -        <td>
    -          <p>
    -          Defines a window of 5 seconds, that "tumbles". This means that elements are
    -          grouped according to their timestamp in groups of 5 second duration, and every
element belongs to exactly one window.
    -	  The notion of time is specified by the selected TimeCharacteristic (see <a href="{{
site.baseurl }}/apis/streaming/event_time.html">time</a>).
    -    {% highlight java %}
    -keyedStream.timeWindow(Time.seconds(5));
    -    {% endhighlight %}
    -          </p>
    -        </td>
    -      </tr>
    -      <tr>
    -          <td><strong>Sliding time window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -          <td>
    -            <p>
    -             Defines a window of 5 seconds, that "slides" by 1 second. This means that
elements are
    -             grouped according to their timestamp in groups of 5 second duration, and
elements can belong to more than
    -             one window (since windows overlap by at most 4 seconds)
    -             The notion of time is specified by the selected TimeCharacteristic (see
<a href="{{ site.baseurl }}/apis/streaming/event_time.html">time</a>).
    -      {% highlight java %}
    -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
    -      {% endhighlight %}
    -            </p>
    -          </td>
    -        </tr>
    -      <tr>
    -        <td><strong>Tumbling count window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -        <td>
    -          <p>
    -          Defines a window of 1000 elements, that "tumbles". This means that elements
are
    -          grouped according to their arrival time (equivalent to processing time) in
groups of 1000 elements,
    -          and every element belongs to exactly one window.
    -    {% highlight java %}
    -keyedStream.countWindow(1000);
    -    {% endhighlight %}
    -        </p>
    -        </td>
    -      </tr>
    -      <tr>
    -      <td><strong>Sliding count window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -      <td>
    -        <p>
    -          Defines a window of 1000 elements, that "slides" every 100 elements. This means
that elements are
    -          grouped according to their arrival time (equivalent to processing time) in
groups of 1000 elements,
    -          and every element can belong to more than one window (as windows overlap by
at most 900 elements).
    -  {% highlight java %}
    -keyedStream.countWindow(1000, 100)
    -  {% endhighlight %}
    -        </p>
    -      </td>
    -    </tr>
    -  </tbody>
    -</table>
    -
    +input
    +    .keyBy(<key selector>)
    +    .window(<window assigner>)
    +    .<windowed transformation>(<window function>);
    +{% endhighlight %}
     </div>
     
     <div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val input: DataStream[T] = ...
     
    -<br />
    -
    -<table class="table table-bordered">
    -  <thead>
    -    <tr>
    -      <th class="text-left" style="width: 25%">Transformation</th>
    -      <th class="text-center">Description</th>
    -    </tr>
    -  </thead>
    -  <tbody>
    -      <tr>
    -        <td><strong>Tumbling time window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -        <td>
    -          <p>
    -          Defines a window of 5 seconds, that "tumbles". This means that elements are
    -          grouped according to their timestamp in groups of 5 second duration, and every
element belongs to exactly one window.
    -          The notion of time is specified by the selected TimeCharacteristic (see <a
href="{{ site.baseurl }}/apis/streaming/event_time.html">time</a>).
    -    {% highlight scala %}
    -keyedStream.timeWindow(Time.seconds(5))
    -    {% endhighlight %}
    -          </p>
    -        </td>
    -      </tr>
    -      <tr>
    -          <td><strong>Sliding time window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -          <td>
    -            <p>
    -             Defines a window of 5 seconds, that "slides" by 1 second. This means that
elements are
    -             grouped according to their timestamp in groups of 5 second duration, and
elements can belong to more than
    -             one window (since windows overlap by at most 4 seconds)
    -             The notion of time is specified by the selected TimeCharacteristic (see
<a href="{{ site.baseurl }}/apis/streaming/event_time.html">time</a>).
    -      {% highlight scala %}
    -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1))
    -      {% endhighlight %}
    -            </p>
    -          </td>
    -        </tr>
    -      <tr>
    -        <td><strong>Tumbling count window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -        <td>
    -          <p>
    -          Defines a window of 1000 elements, that "tumbles". This means that elements
are
    -          grouped according to their arrival time (equivalent to processing time) in
groups of 1000 elements,
    -          and every element belongs to exactly one window.
    -    {% highlight scala %}
    -keyedStream.countWindow(1000)
    -    {% endhighlight %}
    -        </p>
    -        </td>
    -      </tr>
    -      <tr>
    -      <td><strong>Sliding count window</strong><br>KeyedStream
&rarr; WindowedStream</td>
    -      <td>
    -        <p>
    -          Defines a window of 1000 elements, that "slides" every 100 elements. This means
that elements are
    -          grouped according to their arrival time (equivalent to processing time) in
groups of 1000 elements,
    -          and every element can belong to more than one window (as windows overlap by
at most 900 elements).
    -  {% highlight scala %}
    -keyedStream.countWindow(1000, 100)
    -  {% endhighlight %}
    -        </p>
    -      </td>
    -    </tr>
    -  </tbody>
    -</table>
    -
    +input
    +    .keyBy(<key selector>)
    +    .window(<window assigner>)
    +    .<windowed transformation>(<window function>)
    +{% endhighlight %}
     </div>
     </div>
     
    -### Advanced Window Constructs
    +We will cover [window assigners](#window-assigners) in a separate section below.
    +
    +The window transformation can be one of `reduce()`, `fold()` or `apply()`. Which respectively
    +takes a `ReduceFunction`, `FoldFunction` or `WindowFunction`. We describe each of these
ways
    +of specifying a windowed transformation in detail below: [window functions](#window-functions).
    +
    +For more advanced use cases you can also specify a `Trigger` that determines when exactly
a window
    +is being considered as *ready for processing*. These will be covered in more detail in
    +[triggers](#triggers).
    +
    +## Window Assigners
    +
    +The window assigner specifies how elements of the stream are divided into finite slices.
Flink comes
    +with pre-implemented window assigners for the most typical use cases, namely *tumbling
windows*,
    +*sliding windows*, *session windows* and *global windows* but you can implement your
own by
    +extending the `WindowAssigner` class. All the built-in window assigners, except for the
global
    +windows one, assign elements to windows based on time, which can either be processing
time or event
    +time.
    +
    +Let's first look at how each of these window assigners works before looking at how they
can be used
    --- End diff --
    
    I thought it was with "s" because each of the window assigners is a singular. Could be
wrong, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message