flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Fri, 20 Mar 2015 18:31:09 GMT
Ah the Tweet infra bot just announce extended downtime for Confluence [1]

- Henry

[1] https://twitter.com/infrabot/status/578983473970475008

On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <sewen@apache.org> wrote:
> For me as well. Earlier today it said "down for maintenance"
>
> On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
>
>> it's down for me as well
>>
>> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <henry.saputra@gmail.com>
>> wrote:
>>
>> > Is the wiki down for any of you?
>> >
>> > I can't access
>> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
>> >
>> > 404
>> >
>> > - Henry
>> >
>> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <ktzoumas@apache.org>
>> > wrote:
>> > > I added a document for data exchange between tasks:
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> > >
>> > > Feel free to edit. I plan to link the class names to the class files in
>> > > github.
>> > >
>> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzoumas@apache.org>
>> > > wrote:
>> > >
>> > >> +1 for the Wiki.
>> > >>
>> > >> When these have been stabilized we can move them to the docs if we
>> > decide
>> > >> to do so.
>> > >>
>> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org>
>> > wrote:
>> > >>
>> > >>> I have put my suggested version of an outline for the docs into
the
>> > wiki.
>> > >>> Regardless where the docs end up (wiki or repository), we can use
the
>> > wiki
>> > >>> to outline the docs.
>> > >>>
>> > >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>> > >>>
>> > >>> Some pages contain some stub or outline, others are completely
blank.
>> > >>>
>> > >>> Not a comple list. Additions are welcome.
>> > >>>
>> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <sewen@apache.org>
>> > wrote:
>> > >>>
>> > >>> > I think the Wiki has a much lower barrier of entry to fix
docs,
>> > >>> especially
>> > >>> > for external people. The docs, with the Jekyll setup, is rather
>> > tricky.
>> > >>> > I would very much like that all kinds of people contribute
to the
>> > docs
>> > >>> > about the internals, not just the usual three suspects that
have
>> done
>> > >>> this
>> > >>> > so far.
>> > >>> >
>> > >>> > Having a good landing page in the regular docs is exactly
to not
>> > loose
>> > >>> all
>> > >>> > the people that do not look into a wiki. The overview pages
for the
>> > >>> > internals need to be good and accessible and nicely link to
the
>> wiki
>> > to
>> > >>> > "forward" people there.
>> > >>> >
>> > >>> > The overhead of deciding what goes where should not be terribly
>> > large,
>> > >>> in
>> > >>> > my opinion, since there is no really "wrong" place to put
it.
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
>> > aljoscha@apache.org>
>> > >>> > wrote:
>> > >>> >
>> > >>> >> Why do you wan't to split stuff between the doc in the
repository
>> > and
>> > >>> >> the wiki. I for one would always be to lazy to check stuff
in a
>> wiki
>> > >>> >> when there is also a documentation. Plus, this would lead
to
>> > >>> >> additional overhead in deciding what goes where and syncing
>> between
>> > >>> >> the two places for documentation.
>> > >>> >>
>> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <sewen@apache.org>
>> > >>> wrote:
>> > >>> >> > Ah, I totally forgot to add to the internals:
>> > >>> >> >
>> > >>> >> >   - Fault tolerance in Batch mode
>> > >>> >> >
>> > >>> >> >   - Fault Tolerance in Streaming Mode, with state
handling
>> > >>> >> >
>> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org
>> >
>> > >>> wrote:
>> > >>> >> >
>> > >>> >> >> Hi all!
>> > >>> >> >>
>> > >>> >> >> I would like to kick of an effort to improve
the documentation
>> of
>> > >>> the
>> > >>> >> >> Flink Architecture and internals. This also means
making the
>> > >>> streaming
>> > >>> >> >> architecture more prominent in the docs.
>> > >>> >> >>
>> > >>> >> >> Being quite a sophisticated stack, we need to
improve the
>> > >>> presentation
>> > >>> >> of
>> > >>> >> >> how Flink works - to an extend necessary to use
Flink (and to
>> > >>> >> appreciate
>> > >>> >> >> all the cool stuff that is happening). This should
also come in
>> > >>> handy
>> > >>> >> with
>> > >>> >> >> new contributors.
>> > >>> >> >>
>> > >>> >> >> As a general umbrella, we need to first decide
where and how to
>> > >>> >> organize
>> > >>> >> >> the documentation.
>> > >>> >> >>
>> > >>> >> >> I would propose to put the bulk of the documentation
into the
>> > Wiki.
>> > >>> >> Create
>> > >>> >> >> a dedicated section on Flink Internals and sub-pages
for each
>> > >>> >> component /
>> > >>> >> >> topic. To the docs, we add a general overview
from which we
>> link
>> > >>> into
>> > >>> >> the
>> > >>> >> >> Wiki.
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>  == These sections would go into the DOCS in
the git repository
>> > ==
>> > >>> >> >>
>> > >>> >> >>   - Overview of Program, pre-flight phase (type
extraction,
>> > >>> optimizer),
>> > >>> >> >> JobManager, TaskManager. Differences between
streaming and
>> > batch. We
>> > >>> >> can
>> > >>> >> >> realize this through one very nice picture with
few lines of
>> > text.
>> > >>> >> >>
>> > >>> >> >>   - High level architecture stack, different
program
>> > representations
>> > >>> >> (API
>> > >>> >> >> operators, common API DAG, optimizer DAG, parallel
data flow
>> > >>> (JobGraph
>> > >>> >> /
>> > >>> >> >> Execution Graph)
>> > >>> >> >>
>> > >>> >> >>   - (maybe) Parallelism and scheduling. This
seems to be
>> > paramount
>> > >>> to
>> > >>> >> >> understand for users.
>> > >>> >> >>
>> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver,
WebClient,
>> CLI
>> > >>> >> client)
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>  == These sections would go into the WIKI ==
>> > >>> >> >>
>> > >>> >> >>   - Project structure (maven projects, what is
where,
>> > dependencies
>> > >>> >> between
>> > >>> >> >> projects)
>> > >>> >> >>
>> > >>> >> >>   - Component overview
>> > >>> >> >>
>> > >>> >> >>     -> JobManager (InstanceManager, Scheduler,
BLOB server,
>> > Library
>> > >>> >> Cache,
>> > >>> >> >> Archiving)
>> > >>> >> >>
>> > >>> >> >>     -> TaskManager (MemoryManager, IOManager,
BLOB Cache,
>> Library
>> > >>> >> Cache)
>> > >>> >> >>
>> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
>> > >>> >> >>
>> > >>> >> >>   - Details about submitting a job (library upload,
job graph
>> > >>> >> submission,
>> > >>> >> >> execution graph setup, scheduling trigger)
>> > >>> >> >>
>> > >>> >> >>   - Memory Management
>> > >>> >> >>
>> > >>> >> >>   - Optimizer internals
>> > >>> >> >>
>> > >>> >> >>   - Akka Setup specifics
>> > >>> >> >>
>> > >>> >> >>   - Netty and pluggable data exchange strategies
>> > >>> >> >>
>> > >>> >> >>   - Testing: Flink test clusters and unit test
utilities
>> > >>> >> >>
>> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ,
Travis
>> > >>> >> >>
>> > >>> >> >>   - Step-by-step guide to add a new operator
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >> I will go ahead and stub some sections in the
Wiki.
>> > >>> >> >>
>> > >>> >> >> As we discuss and agree/disagree with the outline,
we can
>> evolve
>> > the
>> > >>> >> Wiki.
>> > >>> >> >>
>> > >>> >> >> Greetings,
>> > >>> >> >> Stephan
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> >
>>

Mime
View raw message