flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Improve the documentation of the Flink Architecture and internals
Date Mon, 16 Mar 2015 18:51:37 GMT
Hi all!

I would like to kick of an effort to improve the documentation of the Flink
Architecture and internals. This also means making the streaming
architecture more prominent in the docs.

Being quite a sophisticated stack, we need to improve the presentation of
how Flink works - to an extend necessary to use Flink (and to appreciate
all the cool stuff that is happening). This should also come in handy with
new contributors.

As a general umbrella, we need to first decide where and how to organize
the documentation.

I would propose to put the bulk of the documentation into the Wiki. Create
a dedicated section on Flink Internals and sub-pages for each component /
topic. To the docs, we add a general overview from which we link into the

 == These sections would go into the DOCS in the git repository ==

  - Overview of Program, pre-flight phase (type extraction, optimizer),
JobManager, TaskManager. Differences between streaming and batch. We can
realize this through one very nice picture with few lines of text.

  - High level architecture stack, different program representations (API
operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
Execution Graph)

  - (maybe) Parallelism and scheduling. This seems to be paramount to
understand for users.

  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client)

 == These sections would go into the WIKI ==

  - Project structure (maven projects, what is where, dependencies between

  - Component overview

    -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache,

    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)

    -> Involved Actor Systems / Actors / Messages

  - Details about submitting a job (library upload, job graph submission,
execution graph setup, scheduling trigger)

  - Memory Management

  - Optimizer internals

  - Akka Setup specifics

  - Netty and pluggable data exchange strategies

  - Testing: Flink test clusters and unit test utilities

  - Developer How-To: Setting up Eclipse, IntelliJ, Travis

  - Step-by-step guide to add a new operator

I will go ahead and stub some sections in the Wiki.

As we discuss and agree/disagree with the outline, we can evolve the Wiki.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message