flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Sat, 21 Mar 2015 18:38:47 GMT
Very nice post, Till!

We are starting to get much better with this...

On Sat, Mar 21, 2015 at 6:45 PM, Henry Saputra <henry.saputra@gmail.com>
wrote:

> Awesome, thanks Till
>
> On Saturday, March 21, 2015, Till Rohrmann <trohrmann@apache.org> wrote:
>
> > I wrote some internal documentation for Akka and the distributed
> > communication [1].
> >
> > Cheers,
> >
> > Till
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
> >
> > On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <henry.saputra@gmail.com
> > <javascript:;>>
> > wrote:
> >
> > > Ah the Tweet infra bot just announce extended downtime for Confluence
> [1]
> > >
> > > - Henry
> > >
> > > [1] https://twitter.com/infrabot/status/578983473970475008
> > >
> > > On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <sewen@apache.org
> > <javascript:;>> wrote:
> > > > For me as well. Earlier today it said "down for maintenance"
> > > >
> > > > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <ktzoumas@apache.org
> > <javascript:;>>
> > > wrote:
> > > >
> > > >> it's down for me as well
> > > >>
> > > >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <
> > henry.saputra@gmail.com <javascript:;>
> > > >
> > > >> wrote:
> > > >>
> > > >> > Is the wiki down for any of you?
> > > >> >
> > > >> > I can't access
> > > >> >
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > > >> >
> > > >> > 404
> > > >> >
> > > >> > - Henry
> > > >> >
> > > >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <
> > ktzoumas@apache.org <javascript:;>>
> > > >> > wrote:
> > > >> > > I added a document for data exchange between tasks:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > > >> > >
> > > >> > > Feel free to edit. I plan to link the class names to the
class
> > > files in
> > > >> > > github.
> > > >> > >
> > > >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <
> > > ktzoumas@apache.org <javascript:;>>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> +1 for the Wiki.
> > > >> > >>
> > > >> > >> When these have been stabilized we can move them to
the docs if
> > we
> > > >> > decide
> > > >> > >> to do so.
> > > >> > >>
> > > >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <
> sewen@apache.org
> > <javascript:;>>
> > > >> > wrote:
> > > >> > >>
> > > >> > >>> I have put my suggested version of an outline for
the docs
> into
> > > the
> > > >> > wiki.
> > > >> > >>> Regardless where the docs end up (wiki or repository),
we can
> > use
> > > the
> > > >> > wiki
> > > >> > >>> to outline the docs.
> > > >> > >>>
> > > >> > >>>
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> > > >> > >>>
> > > >> > >>> Some pages contain some stub or outline, others
are completely
> > > blank.
> > > >> > >>>
> > > >> > >>> Not a comple list. Additions are welcome.
> > > >> > >>>
> > > >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <
> > sewen@apache.org <javascript:;>>
> > > >> > wrote:
> > > >> > >>>
> > > >> > >>> > I think the Wiki has a much lower barrier of
entry to fix
> > docs,
> > > >> > >>> especially
> > > >> > >>> > for external people. The docs, with the Jekyll
setup, is
> > rather
> > > >> > tricky.
> > > >> > >>> > I would very much like that all kinds of people
contribute
> to
> > > the
> > > >> > docs
> > > >> > >>> > about the internals, not just the usual three
suspects that
> > have
> > > >> done
> > > >> > >>> this
> > > >> > >>> > so far.
> > > >> > >>> >
> > > >> > >>> > Having a good landing page in the regular docs
is exactly to
> > not
> > > >> > loose
> > > >> > >>> all
> > > >> > >>> > the people that do not look into a wiki. The
overview pages
> > for
> > > the
> > > >> > >>> > internals need to be good and accessible and
nicely link to
> > the
> > > >> wiki
> > > >> > to
> > > >> > >>> > "forward" people there.
> > > >> > >>> >
> > > >> > >>> > The overhead of deciding what goes where should
not be
> > terribly
> > > >> > large,
> > > >> > >>> in
> > > >> > >>> > my opinion, since there is no really "wrong"
place to put
> it.
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek
<
> > > >> > aljoscha@apache.org <javascript:;>>
> > > >> > >>> > wrote:
> > > >> > >>> >
> > > >> > >>> >> Why do you wan't to split stuff between
the doc in the
> > > repository
> > > >> > and
> > > >> > >>> >> the wiki. I for one would always be to
lazy to check stuff
> > in a
> > > >> wiki
> > > >> > >>> >> when there is also a documentation. Plus,
this would lead
> to
> > > >> > >>> >> additional overhead in deciding what goes
where and syncing
> > > >> between
> > > >> > >>> >> the two places for documentation.
> > > >> > >>> >>
> > > >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan
Ewen <
> > > sewen@apache.org <javascript:;>>
> > > >> > >>> wrote:
> > > >> > >>> >> > Ah, I totally forgot to add to the
internals:
> > > >> > >>> >> >
> > > >> > >>> >> >   - Fault tolerance in Batch mode
> > > >> > >>> >> >
> > > >> > >>> >> >   - Fault Tolerance in Streaming Mode,
with state
> handling
> > > >> > >>> >> >
> > > >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan
Ewen <
> > > sewen@apache.org <javascript:;>
> > > >> >
> > > >> > >>> wrote:
> > > >> > >>> >> >
> > > >> > >>> >> >> Hi all!
> > > >> > >>> >> >>
> > > >> > >>> >> >> I would like to kick of an effort
to improve the
> > > documentation
> > > >> of
> > > >> > >>> the
> > > >> > >>> >> >> Flink Architecture and internals.
This also means making
> > the
> > > >> > >>> streaming
> > > >> > >>> >> >> architecture more prominent in
the docs.
> > > >> > >>> >> >>
> > > >> > >>> >> >> Being quite a sophisticated stack,
we need to improve
> the
> > > >> > >>> presentation
> > > >> > >>> >> of
> > > >> > >>> >> >> how Flink works - to an extend
necessary to use Flink
> (and
> > > to
> > > >> > >>> >> appreciate
> > > >> > >>> >> >> all the cool stuff that is happening).
This should also
> > > come in
> > > >> > >>> handy
> > > >> > >>> >> with
> > > >> > >>> >> >> new contributors.
> > > >> > >>> >> >>
> > > >> > >>> >> >> As a general umbrella, we need
to first decide where and
> > > how to
> > > >> > >>> >> organize
> > > >> > >>> >> >> the documentation.
> > > >> > >>> >> >>
> > > >> > >>> >> >> I would propose to put the bulk
of the documentation
> into
> > > the
> > > >> > Wiki.
> > > >> > >>> >> Create
> > > >> > >>> >> >> a dedicated section on Flink Internals
and sub-pages for
> > > each
> > > >> > >>> >> component /
> > > >> > >>> >> >> topic. To the docs, we add a general
overview from which
> > we
> > > >> link
> > > >> > >>> into
> > > >> > >>> >> the
> > > >> > >>> >> >> Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>  == These sections would go into
the DOCS in the git
> > > repository
> > > >> > ==
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Overview of Program, pre-flight
phase (type
> > extraction,
> > > >> > >>> optimizer),
> > > >> > >>> >> >> JobManager, TaskManager. Differences
between streaming
> and
> > > >> > batch. We
> > > >> > >>> >> can
> > > >> > >>> >> >> realize this through one very
nice picture with few
> lines
> > of
> > > >> > text.
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - High level architecture stack,
different program
> > > >> > representations
> > > >> > >>> >> (API
> > > >> > >>> >> >> operators, common API DAG, optimizer
DAG, parallel data
> > flow
> > > >> > >>> (JobGraph
> > > >> > >>> >> /
> > > >> > >>> >> >> Execution Graph)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - (maybe) Parallelism and scheduling.
This seems to be
> > > >> > paramount
> > > >> > >>> to
> > > >> > >>> >> >> understand for users.
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Processes (JobManager, TaskManager,
Webserver,
> > > WebClient,
> > > >> CLI
> > > >> > >>> >> client)
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>  == These sections would go into
the WIKI ==
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Project structure (maven projects,
what is where,
> > > >> > dependencies
> > > >> > >>> >> between
> > > >> > >>> >> >> projects)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Component overview
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> JobManager (InstanceManager,
Scheduler, BLOB
> > server,
> > > >> > Library
> > > >> > >>> >> Cache,
> > > >> > >>> >> >> Archiving)
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> TaskManager (MemoryManager,
IOManager, BLOB
> Cache,
> > > >> Library
> > > >> > >>> >> Cache)
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> Involved Actor Systems
/ Actors / Messages
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Details about submitting a
job (library upload, job
> > > graph
> > > >> > >>> >> submission,
> > > >> > >>> >> >> execution graph setup, scheduling
trigger)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Memory Management
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Optimizer internals
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Akka Setup specifics
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Netty and pluggable data exchange
strategies
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Testing: Flink test clusters
and unit test utilities
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Developer How-To: Setting
up Eclipse, IntelliJ,
> Travis
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Step-by-step guide to add
a new operator
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >> I will go ahead and stub some
sections in the Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >> As we discuss and agree/disagree
with the outline, we
> can
> > > >> evolve
> > > >> > the
> > > >> > >>> >> Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >> Greetings,
> > > >> > >>> >> >> Stephan
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >>
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message