flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Improve the documentation of the Flink Architecture and internals
Date Fri, 20 Mar 2015 16:47:25 GMT
I don't think so. But adding new contributors wiki username to edit is
much easier to merging changes to website, I hope.

- Henry

On Fri, Mar 20, 2015 at 8:53 AM, Maximilian Michels <mxm@apache.org> wrote:
> +1 for the initiative and for the wiki.
>
> At the moment, the wiki's barrier to entry is like the git repository's.
> Contributors need to explicitly ask comitters for access to the wiki. Is
> there a way we could open up the wiki for contributors without having to
> face too much spam? (e.g. have changes approved before showing them in the
> wiki)
>
> On Fri, Mar 20, 2015 at 12:49 PM, Ufuk Celebi <uce@apache.org> wrote:
>
>> Thanks. I will have a look later :-)
>>
>> +1 for the Wiki. I think the low overhead does not only make it easier to
>> contribute for newcomers, but for committers as well. :-)
>>
>> On 20 Mar 2015, at 12:46, Kostas Tzoumas <ktzoumas@apache.org> wrote:
>>
>> > I added a document for data exchange between tasks:
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> >
>> > Feel free to edit. I plan to link the class names to the class files in
>> > github.
>> >
>> > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzoumas@apache.org>
>> > wrote:
>> >
>> >> +1 for the Wiki.
>> >>
>> >> When these have been stabilized we can move them to the docs if we
>> decide
>> >> to do so.
>> >>
>> >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>> >>
>> >>> I have put my suggested version of an outline for the docs into the
>> wiki.
>> >>> Regardless where the docs end up (wiki or repository), we can use the
>> wiki
>> >>> to outline the docs.
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>> >>>
>> >>> Some pages contain some stub or outline, others are completely blank.
>> >>>
>> >>> Not a comple list. Additions are welcome.
>> >>>
>> >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>> >>>
>> >>>> I think the Wiki has a much lower barrier of entry to fix docs,
>> >>> especially
>> >>>> for external people. The docs, with the Jekyll setup, is rather
>> tricky.
>> >>>> I would very much like that all kinds of people contribute to the
docs
>> >>>> about the internals, not just the usual three suspects that have
done
>> >>> this
>> >>>> so far.
>> >>>>
>> >>>> Having a good landing page in the regular docs is exactly to not
loose
>> >>> all
>> >>>> the people that do not look into a wiki. The overview pages for
the
>> >>>> internals need to be good and accessible and nicely link to the
wiki
>> to
>> >>>> "forward" people there.
>> >>>>
>> >>>> The overhead of deciding what goes where should not be terribly
large,
>> >>> in
>> >>>> my opinion, since there is no really "wrong" place to put it.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
>> aljoscha@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Why do you wan't to split stuff between the doc in the repository
and
>> >>>>> the wiki. I for one would always be to lazy to check stuff in
a wiki
>> >>>>> when there is also a documentation. Plus, this would lead to
>> >>>>> additional overhead in deciding what goes where and syncing
between
>> >>>>> the two places for documentation.
>> >>>>>
>> >>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <sewen@apache.org>
>> >>> wrote:
>> >>>>>> Ah, I totally forgot to add to the internals:
>> >>>>>>
>> >>>>>>  - Fault tolerance in Batch mode
>> >>>>>>
>> >>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>> >>>>>>
>> >>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org>
>> >>> wrote:
>> >>>>>>
>> >>>>>>> Hi all!
>> >>>>>>>
>> >>>>>>> I would like to kick of an effort to improve the documentation
of
>> >>> the
>> >>>>>>> Flink Architecture and internals. This also means making
the
>> >>> streaming
>> >>>>>>> architecture more prominent in the docs.
>> >>>>>>>
>> >>>>>>> Being quite a sophisticated stack, we need to improve
the
>> >>> presentation
>> >>>>> of
>> >>>>>>> how Flink works - to an extend necessary to use Flink
(and to
>> >>>>> appreciate
>> >>>>>>> all the cool stuff that is happening). This should also
come in
>> >>> handy
>> >>>>> with
>> >>>>>>> new contributors.
>> >>>>>>>
>> >>>>>>> As a general umbrella, we need to first decide where
and how to
>> >>>>> organize
>> >>>>>>> the documentation.
>> >>>>>>>
>> >>>>>>> I would propose to put the bulk of the documentation
into the Wiki.
>> >>>>> Create
>> >>>>>>> a dedicated section on Flink Internals and sub-pages
for each
>> >>>>> component /
>> >>>>>>> topic. To the docs, we add a general overview from which
we link
>> >>> into
>> >>>>> the
>> >>>>>>> Wiki.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> == These sections would go into the DOCS in the git
repository ==
>> >>>>>>>
>> >>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>> >>> optimizer),
>> >>>>>>> JobManager, TaskManager. Differences between streaming
and batch.
>> We
>> >>>>> can
>> >>>>>>> realize this through one very nice picture with few
lines of text.
>> >>>>>>>
>> >>>>>>>  - High level architecture stack, different program
representations
>> >>>>> (API
>> >>>>>>> operators, common API DAG, optimizer DAG, parallel data
flow
>> >>> (JobGraph
>> >>>>> /
>> >>>>>>> Execution Graph)
>> >>>>>>>
>> >>>>>>>  - (maybe) Parallelism and scheduling. This seems to
be paramount
>> >>> to
>> >>>>>>> understand for users.
>> >>>>>>>
>> >>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient,
CLI
>> >>>>> client)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> == These sections would go into the WIKI ==
>> >>>>>>>
>> >>>>>>>  - Project structure (maven projects, what is where,
dependencies
>> >>>>> between
>> >>>>>>> projects)
>> >>>>>>>
>> >>>>>>>  - Component overview
>> >>>>>>>
>> >>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB
server, Library
>> >>>>> Cache,
>> >>>>>>> Archiving)
>> >>>>>>>
>> >>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB
Cache, Library
>> >>>>> Cache)
>> >>>>>>>
>> >>>>>>>    -> Involved Actor Systems / Actors / Messages
>> >>>>>>>
>> >>>>>>>  - Details about submitting a job (library upload, job
graph
>> >>>>> submission,
>> >>>>>>> execution graph setup, scheduling trigger)
>> >>>>>>>
>> >>>>>>>  - Memory Management
>> >>>>>>>
>> >>>>>>>  - Optimizer internals
>> >>>>>>>
>> >>>>>>>  - Akka Setup specifics
>> >>>>>>>
>> >>>>>>>  - Netty and pluggable data exchange strategies
>> >>>>>>>
>> >>>>>>>  - Testing: Flink test clusters and unit test utilities
>> >>>>>>>
>> >>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> >>>>>>>
>> >>>>>>>  - Step-by-step guide to add a new operator
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I will go ahead and stub some sections in the Wiki.
>> >>>>>>>
>> >>>>>>> As we discuss and agree/disagree with the outline, we
can evolve
>> the
>> >>>>> Wiki.
>> >>>>>>>
>> >>>>>>> Greetings,
>> >>>>>>> Stephan
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>

Mime
View raw message