flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Mon, 23 Mar 2015 11:04:17 GMT
I couldn't have a look at it earlier, because the Wiki was down. Very nice overview of the
flow of things. I like the text and pictures a lot.

I will add content about:

1) The way that we do the network transfers with Netty

2) A more detailed message flow for pipelined vs. blocking results.


I am actually very happy that we moved this to the Wiki... it is so much easier to fix minor
things now. :-)

On 20 Mar 2015, at 12:48, Ufuk Celebi <uce@apache.org> wrote:

> Thanks. I will have a look later :-)
> 
> +1 for the Wiki. I think the low overhead makle
> 
> On 20 Mar 2015, at 12:46, Kostas Tzoumas <ktzoumas@apache.org> wrote:
> 
>> I added a document for data exchange between tasks:
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> 
>> Feel free to edit. I plan to link the class names to the class files in
>> github.
>> 
>> On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzoumas@apache.org>
>> wrote:
>> 
>>> +1 for the Wiki.
>>> 
>>> When these have been stabilized we can move them to the docs if we decide
>>> to do so.
>>> 
>>> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org> wrote:
>>> 
>>>> I have put my suggested version of an outline for the docs into the wiki.
>>>> Regardless where the docs end up (wiki or repository), we can use the wiki
>>>> to outline the docs.
>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>>> 
>>>> Some pages contain some stub or outline, others are completely blank.
>>>> 
>>>> Not a comple list. Additions are welcome.
>>>> 
>>>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>> 
>>>>> I think the Wiki has a much lower barrier of entry to fix docs,
>>>> especially
>>>>> for external people. The docs, with the Jekyll setup, is rather tricky.
>>>>> I would very much like that all kinds of people contribute to the docs
>>>>> about the internals, not just the usual three suspects that have done
>>>> this
>>>>> so far.
>>>>> 
>>>>> Having a good landing page in the regular docs is exactly to not loose
>>>> all
>>>>> the people that do not look into a wiki. The overview pages for the
>>>>> internals need to be good and accessible and nicely link to the wiki
to
>>>>> "forward" people there.
>>>>> 
>>>>> The overhead of deciding what goes where should not be terribly large,
>>>> in
>>>>> my opinion, since there is no really "wrong" place to put it.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljoscha@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Why do you wan't to split stuff between the doc in the repository
and
>>>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
>>>>>> when there is also a documentation. Plus, this would lead to
>>>>>> additional overhead in deciding what goes where and syncing between
>>>>>> the two places for documentation.
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <sewen@apache.org>
>>>> wrote:
>>>>>>> Ah, I totally forgot to add to the internals:
>>>>>>> 
>>>>>>>  - Fault tolerance in Batch mode
>>>>>>> 
>>>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>>>>>>> 
>>>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all!
>>>>>>>> 
>>>>>>>> I would like to kick of an effort to improve the documentation
of
>>>> the
>>>>>>>> Flink Architecture and internals. This also means making
the
>>>> streaming
>>>>>>>> architecture more prominent in the docs.
>>>>>>>> 
>>>>>>>> Being quite a sophisticated stack, we need to improve the
>>>> presentation
>>>>>> of
>>>>>>>> how Flink works - to an extend necessary to use Flink (and
to
>>>>>> appreciate
>>>>>>>> all the cool stuff that is happening). This should also come
in
>>>> handy
>>>>>> with
>>>>>>>> new contributors.
>>>>>>>> 
>>>>>>>> As a general umbrella, we need to first decide where and
how to
>>>>>> organize
>>>>>>>> the documentation.
>>>>>>>> 
>>>>>>>> I would propose to put the bulk of the documentation into
the Wiki.
>>>>>> Create
>>>>>>>> a dedicated section on Flink Internals and sub-pages for
each
>>>>>> component /
>>>>>>>> topic. To the docs, we add a general overview from which
we link
>>>> into
>>>>>> the
>>>>>>>> Wiki.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> == These sections would go into the DOCS in the git repository
==
>>>>>>>> 
>>>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>>>> optimizer),
>>>>>>>> JobManager, TaskManager. Differences between streaming and
batch. We
>>>>>> can
>>>>>>>> realize this through one very nice picture with few lines
of text.
>>>>>>>> 
>>>>>>>>  - High level architecture stack, different program representations
>>>>>> (API
>>>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
>>>> (JobGraph
>>>>>> /
>>>>>>>> Execution Graph)
>>>>>>>> 
>>>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
>>>> to
>>>>>>>> understand for users.
>>>>>>>> 
>>>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient,
CLI
>>>>>> client)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> == These sections would go into the WIKI ==
>>>>>>>> 
>>>>>>>>  - Project structure (maven projects, what is where, dependencies
>>>>>> between
>>>>>>>> projects)
>>>>>>>> 
>>>>>>>>  - Component overview
>>>>>>>> 
>>>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server,
Library
>>>>>> Cache,
>>>>>>>> Archiving)
>>>>>>>> 
>>>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache,
Library
>>>>>> Cache)
>>>>>>>> 
>>>>>>>>    -> Involved Actor Systems / Actors / Messages
>>>>>>>> 
>>>>>>>>  - Details about submitting a job (library upload, job graph
>>>>>> submission,
>>>>>>>> execution graph setup, scheduling trigger)
>>>>>>>> 
>>>>>>>>  - Memory Management
>>>>>>>> 
>>>>>>>>  - Optimizer internals
>>>>>>>> 
>>>>>>>>  - Akka Setup specifics
>>>>>>>> 
>>>>>>>>  - Netty and pluggable data exchange strategies
>>>>>>>> 
>>>>>>>>  - Testing: Flink test clusters and unit test utilities
>>>>>>>> 
>>>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>>>>>>> 
>>>>>>>>  - Step-by-step guide to add a new operator
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I will go ahead and stub some sections in the Wiki.
>>>>>>>> 
>>>>>>>> As we discuss and agree/disagree with the outline, we can
evolve the
>>>>>> Wiki.
>>>>>>>> 
>>>>>>>> Greetings,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
> 


Mime
View raw message