flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-5529) Improve / extends windowing documentation
Date Tue, 17 Jan 2017 13:59:26 GMT
Stephan Ewen created FLINK-5529:

             Summary: Improve / extends windowing documentation
                 Key: FLINK-5529
                 URL: https://issues.apache.org/jira/browse/FLINK-5529
             Project: Flink
          Issue Type: Sub-task
          Components: Documentation
            Reporter: Stephan Ewen
            Assignee: Kostas Kloudas
             Fix For: 1.2.0, 1.3.0

Suggested Outline:


(0) Outline: The anatomy of a window operation

     [.keyBy(...)]         <-  keyed versus non-keyed windows
      .window(...)         <-  required: "assigner"
     [.trigger(...)]       <-  optional: "trigger" (else default trigger)
     [.evictor(...)]       <-  optional: "evictor" (else no evictor)
     [.allowedLateness()]  <-  optional, else zero
      .reduce/fold/apply() <-  required: "function"

(1) Types of windows

  - tumble
  - slide
  - session
  - global

(2) Pre-defined windows

   timeWindow() (tumble, slide)
   countWindow() (tumble, slide)
     - mention that count windows are inherently
       resource leaky unless limited key space

(3) Window Functions

  - apply: most basic, iterates over elements in window
  - aggregating: reduce and fold, can be used with "apply()" which will get one element
  - forward reference to state size section

(4) Advanced Windows

  - assigner
    - simple
    - merging
  - trigger
    - registering timers (processing time, event time)
    - state in triggers
  - life cycle of a window
    - create
    - state
    - cleanup
      - when is window contents purged
      - when is state dropped
      - when is metadata (like merging set) dropped

(5) Late data
  - picture
  - fire vs fire_and_purge: late accumulates vs late resurrects (cf discarding mode)
(6) Evictors
  - TDB
(7) State size: How large will the state be?

Basic rule: Each element has one copy per window it is assigned to
  --> num windows * num elements in window
  --> example: tumbline is one copy, sliding(n,m) is n/m copies
  --> per key

  - if reduce or fold is set -> one element per window (rather than num elements in window)
  - evictor voids pre-aggregation from the perspective of state

Special rules:
  - fold cannot pre-aggregate on session windows (and other merging windows)

(8) Non-keyed windows
  - all elements through the same windows
  - currently not parallel
  - possible parallel in the future when having pre-aggregation functions
  - inherently (by definition) produce a result stream with parallelism one
  - state similar to one key of keyed windows

This message was sent by Atlassian JIRA

View raw message