flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Mon, 16 Mar 2015 20:58:17 GMT
Why do you wan't to split stuff between the doc in the repository and
the wiki. I for one would always be to lazy to check stuff in a wiki
when there is also a documentation. Plus, this would lead to
additional overhead in deciding what goes where and syncing between
the two places for documentation.

On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <sewen@apache.org> wrote:
> Ah, I totally forgot to add to the internals:
>
>   - Fault tolerance in Batch mode
>
>   - Fault Tolerance in Streaming Mode, with state handling
>
> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi all!
>>
>> I would like to kick of an effort to improve the documentation of the
>> Flink Architecture and internals. This also means making the streaming
>> architecture more prominent in the docs.
>>
>> Being quite a sophisticated stack, we need to improve the presentation of
>> how Flink works - to an extend necessary to use Flink (and to appreciate
>> all the cool stuff that is happening). This should also come in handy with
>> new contributors.
>>
>> As a general umbrella, we need to first decide where and how to organize
>> the documentation.
>>
>> I would propose to put the bulk of the documentation into the Wiki. Create
>> a dedicated section on Flink Internals and sub-pages for each component /
>> topic. To the docs, we add a general overview from which we link into the
>> Wiki.
>>
>>
>>  == These sections would go into the DOCS in the git repository ==
>>
>>   - Overview of Program, pre-flight phase (type extraction, optimizer),
>> JobManager, TaskManager. Differences between streaming and batch. We can
>> realize this through one very nice picture with few lines of text.
>>
>>   - High level architecture stack, different program representations (API
>> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
>> Execution Graph)
>>
>>   - (maybe) Parallelism and scheduling. This seems to be paramount to
>> understand for users.
>>
>>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client)
>>
>>
>>
>>  == These sections would go into the WIKI ==
>>
>>   - Project structure (maven projects, what is where, dependencies between
>> projects)
>>
>>   - Component overview
>>
>>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache,
>> Archiving)
>>
>>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)
>>
>>     -> Involved Actor Systems / Actors / Messages
>>
>>   - Details about submitting a job (library upload, job graph submission,
>> execution graph setup, scheduling trigger)
>>
>>   - Memory Management
>>
>>   - Optimizer internals
>>
>>   - Akka Setup specifics
>>
>>   - Netty and pluggable data exchange strategies
>>
>>   - Testing: Flink test clusters and unit test utilities
>>
>>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>
>>   - Step-by-step guide to add a new operator
>>
>>
>> I will go ahead and stub some sections in the Wiki.
>>
>> As we discuss and agree/disagree with the outline, we can evolve the Wiki.
>>
>> Greetings,
>> Stephan
>>
>>

Mime
View raw message