flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <ktzou...@apache.org>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Tue, 17 Mar 2015 10:17:39 GMT
+1 for the Wiki.

When these have been stabilized we can move them to the docs if we decide
to do so.

On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org> wrote:

> I have put my suggested version of an outline for the docs into the wiki.
> Regardless where the docs end up (wiki or repository), we can use the wiki
> to outline the docs.
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>
> Some pages contain some stub or outline, others are completely blank.
>
> Not a comple list. Additions are welcome.
>
> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <sewen@apache.org> wrote:
>
> > I think the Wiki has a much lower barrier of entry to fix docs,
> especially
> > for external people. The docs, with the Jekyll setup, is rather tricky.
> > I would very much like that all kinds of people contribute to the docs
> > about the internals, not just the usual three suspects that have done
> this
> > so far.
> >
> > Having a good landing page in the regular docs is exactly to not loose
> all
> > the people that do not look into a wiki. The overview pages for the
> > internals need to be good and accessible and nicely link to the wiki to
> > "forward" people there.
> >
> > The overhead of deciding what goes where should not be terribly large, in
> > my opinion, since there is no really "wrong" place to put it.
> >
> >
> >
> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> >
> >> Why do you wan't to split stuff between the doc in the repository and
> >> the wiki. I for one would always be to lazy to check stuff in a wiki
> >> when there is also a documentation. Plus, this would lead to
> >> additional overhead in deciding what goes where and syncing between
> >> the two places for documentation.
> >>
> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <sewen@apache.org> wrote:
> >> > Ah, I totally forgot to add to the internals:
> >> >
> >> >   - Fault tolerance in Batch mode
> >> >
> >> >   - Fault Tolerance in Streaming Mode, with state handling
> >> >
> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org>
> wrote:
> >> >
> >> >> Hi all!
> >> >>
> >> >> I would like to kick of an effort to improve the documentation of the
> >> >> Flink Architecture and internals. This also means making the
> streaming
> >> >> architecture more prominent in the docs.
> >> >>
> >> >> Being quite a sophisticated stack, we need to improve the
> presentation
> >> of
> >> >> how Flink works - to an extend necessary to use Flink (and to
> >> appreciate
> >> >> all the cool stuff that is happening). This should also come in handy
> >> with
> >> >> new contributors.
> >> >>
> >> >> As a general umbrella, we need to first decide where and how to
> >> organize
> >> >> the documentation.
> >> >>
> >> >> I would propose to put the bulk of the documentation into the Wiki.
> >> Create
> >> >> a dedicated section on Flink Internals and sub-pages for each
> >> component /
> >> >> topic. To the docs, we add a general overview from which we link into
> >> the
> >> >> Wiki.
> >> >>
> >> >>
> >> >>  == These sections would go into the DOCS in the git repository ==
> >> >>
> >> >>   - Overview of Program, pre-flight phase (type extraction,
> optimizer),
> >> >> JobManager, TaskManager. Differences between streaming and batch. We
> >> can
> >> >> realize this through one very nice picture with few lines of text.
> >> >>
> >> >>   - High level architecture stack, different program representations
> >> (API
> >> >> operators, common API DAG, optimizer DAG, parallel data flow
> (JobGraph
> >> /
> >> >> Execution Graph)
> >> >>
> >> >>   - (maybe) Parallelism and scheduling. This seems to be paramount
to
> >> >> understand for users.
> >> >>
> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
> >> client)
> >> >>
> >> >>
> >> >>
> >> >>  == These sections would go into the WIKI ==
> >> >>
> >> >>   - Project structure (maven projects, what is where, dependencies
> >> between
> >> >> projects)
> >> >>
> >> >>   - Component overview
> >> >>
> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
> >> Cache,
> >> >> Archiving)
> >> >>
> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
> >> Cache)
> >> >>
> >> >>     -> Involved Actor Systems / Actors / Messages
> >> >>
> >> >>   - Details about submitting a job (library upload, job graph
> >> submission,
> >> >> execution graph setup, scheduling trigger)
> >> >>
> >> >>   - Memory Management
> >> >>
> >> >>   - Optimizer internals
> >> >>
> >> >>   - Akka Setup specifics
> >> >>
> >> >>   - Netty and pluggable data exchange strategies
> >> >>
> >> >>   - Testing: Flink test clusters and unit test utilities
> >> >>
> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >> >>
> >> >>   - Step-by-step guide to add a new operator
> >> >>
> >> >>
> >> >> I will go ahead and stub some sections in the Wiki.
> >> >>
> >> >> As we discuss and agree/disagree with the outline, we can evolve the
> >> Wiki.
> >> >>
> >> >> Greetings,
> >> >> Stephan
> >> >>
> >> >>
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message