flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [DISCUSS] Dedicated streaming mode
Date Tue, 26 May 2015 18:53:58 GMT
Ah yes, technically the streaming mode could run batch jobs as well in Flink.
I am thinking that it could cause confusion with users since most
systems that does batch and stream (well, pretty much Spark ^_^) does
not differentiate the deployment topologies for the cluster to support
different modes of applications.

- Henry

On Tue, May 26, 2015 at 11:44 AM, Stephan Ewen <sewen@apache.org> wrote:
> The streaming mode runs batch jobs as well :-)
>
> There should be slightly reduced predictability in the memory management in
> the streaming mode, but otherwise there should not be a problem.
>
> So if you want to run mixed workloads, you start the streaming mode.
>
>
> (Note: Currently, the batch mode runs streaming jobs as well, but gives
> them very little memory. I am thinking of prohibiting that (separate
> discussion), to prevent people from not noticing that and running a highly
> sub-optimal Flink setup.)
>
>
> On Tue, May 26, 2015 at 8:26 PM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
>> One immediate concern I have is the deployment topology. With
>> streaming has its own cluster deployment, this means that in
>> standalone mode, if ops would like to deploy Flink it has to know what
>> mode it needs to deploy Flink as, either batch or Streaming. So, if
>> the use case was to support both batch and streaming, would that mean
>> the deployment need to separate 2 clusters to support different
>> applications to run on Flink?
>>
>> I think this would be ok if Flink is deployed in YARN or other
>> resource management platforms like Mesos or Apache Myriad. Maybe
>> someone, like Robert, could confirm this is the case.
>>
>> - Henry
>>
>> On Tue, May 26, 2015 at 1:51 AM, Maximilian Michels <mxm@apache.org>
>> wrote:
>> > +1 great changes coming up! I like the idea that, ultimately, Flink will
>> > handle streaming and batch programs equally well independently of the
>> > chosen cluster startup mode.
>> >
>> > What is the time frame for these changes?
>> >
>> > On Tue, May 26, 2015 at 7:34 AM, Henry Saputra <henry.saputra@gmail.com>
>> > wrote:
>> >
>> >> Thanks Aljoscha and Stephan, this helps
>> >>
>> >> - Henry
>> >>
>> >> On Fri, May 22, 2015 at 4:37 AM, Stephan Ewen <sewen@apache.org> wrote:
>> >> > Aljoscha is right. There are plans to migrate the streaming state to
>> the
>> >> > MemoryManager as well, but streaming state is not managed at this
>> point.
>> >> >
>> >> > What is managed in streaming jobs is the data buffered and cached in
>> the
>> >> > network stack. But that is a different memory pool than the memory
>> >> manager.
>> >> > We keep those pools separate because the network stack is currently
>> more
>> >> > advanced in terms of dynamically rebalancing memory, compared to the
>> >> memory
>> >> > manager.
>> >> >
>> >> > On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek <
>> aljoscha@apache.org>
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >> streaming currently does not use any memory manager. All state
is
>> kept
>> >> >> in Java Objects on the Java Heap, for example an ArrayList<>
for the
>> >> >> window buffer.
>> >> >>
>> >> >> On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <
>> >> henry.saputra@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Stephan, Gyula, Paris,
>> >> >> >
>> >> >> > How does streaming currently different in term of memory
>> management?
>> >> >> > Currently we only have one MemoryManager which is used by
both
>> modes I
>> >> >> > believe.
>> >> >> >
>> >> >> > - Henry
>> >> >> >
>> >> >> > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <sewen@apache.org>
>> >> wrote:
>> >> >> >> I discussed a bit via Skype with Gyula and Paris.
>> >> >> >>
>> >> >> >>
>> >> >> >> We thought about the following way to do it:
>> >> >> >>
>> >> >> >>  - We add a dedicated streaming mode for now. The streaming
mode
>> >> >> supersedes
>> >> >> >> the batch mode, so it can run both type of programs.
>> >> >> >>
>> >> >> >>  - The streaming mode sets the memory manager to "lazy
>> allocation".
>> >> >> >>     -> So long as it runs pure streaming jobs, the
full heap will
>> be
>> >> >> >> available to window buffers and UDFs.
>> >> >> >>     -> Batch programs can still run, so mixed workloads
are not
>> >> >> prevented.
>> >> >> >> Batch programs are a bit less robust there, because the
memory
>> >> manager
>> >> >> does
>> >> >> >> not pre-allocate memory. UDFs can eat into Flink's memory
portion.
>> >> >> >>
>> >> >> >>  - The streaming mode starts the necessary configured
>> >> >> components/services
>> >> >> >> for state backups
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> Over the next versions, we want to bring these things
together:
>> >> >> >>   - use the managed memory for window buffers
>> >> >> >>   - on-demand starting of the state backend
>> >> >> >>
>> >> >> >> Then, we deprecate the streaming mode, let both modes
start the
>> >> cluster
>> >> >> in
>> >> >> >> the same way.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <
>> >> aljoscha@apache.org>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >>> Would it not be possible to start the snapshot service
once the
>> user
>> >> >> >>> starts the first streaming job? About 2) with checkpointing
>> coming
>> >> up,
>> >> >> >>> would it not make sense to shift to managed memory
rather sooner
>> >> than
>> >> >> >>> later. Then this point would become moot.
>> >> >> >>>
>> >> >> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
>> >> >> >>> <mjsax@informatik.hu-berlin.de> wrote:
>> >> >> >>> > What would be the consequences on "mixed" programs?
(If there
>> is
>> >> any
>> >> >> >>> > plan to support those?)
>> >> >> >>> >
>> >> >> >>> > Would it be necessary to have a third mode? Or
would those
>> >> programs
>> >> >> >>> > simple run in streaming mode?
>> >> >> >>> >
>> >> >> >>> > -Matthias
>> >> >> >>> >
>> >> >> >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
>> >> >> >>> >> Hi all!
>> >> >> >>> >>
>> >> >> >>> >> We discussed a while back about introducing
a dedicated
>> streaming
>> >> >> mode
>> >> >> >>> for
>> >> >> >>> >> Flink. I would like to take a go at this
and implement the
>> >> changes,
>> >> >> but
>> >> >> >>> >> discuss them before.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Here is a brief summary why we wanted to
introduce the
>> dedicated
>> >> >> >>> streaming
>> >> >> >>> >> mode:
>> >> >> >>> >> Even though both batch and streaming are
executed by the same
>> >> >> execution
>> >> >> >>> >> engine,
>> >> >> >>> >> a streaming setup of Flink varies a bit from
a batch setup:
>> >> >> >>> >>
>> >> >> >>> >> 1) The streaming cluster starts an additional
service to store
>> >> the
>> >> >> >>> >> distributed state snapshots.
>> >> >> >>> >>
>> >> >> >>> >> 2) Streaming mode uses memory a bit different,
so we should
>> >> >> configure
>> >> >> >>> the
>> >> >> >>> >> memory manager differently. This difference
may eventually go
>> >> away.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Concretely, to implement this, I was thinking
about
>> introducing
>> >> the
>> >> >> >>> >> following externally visible changes
>> >> >> >>> >>
>> >> >> >>> >>  - Additional scripts "start-streaming-cluster.sh"
and
>> >> >> >>> >> "start-streaming-local.sh"
>> >> >> >>> >>
>> >> >> >>> >>  - An execution mode parameter for the TaskManager
("batch /
>> >> >> streaming")
>> >> >> >>> >>
>> >> >> >>> >>  - An execution mode parameter for the JobManager
TaskManager
>> >> >> ("batch /
>> >> >> >>> >> streaming")
>> >> >> >>> >>
>> >> >> >>> >>  - All local executors and mini clusters
need a flag that
>> >> specifies
>> >> >> >>> whether
>> >> >> >>> >> they will start
>> >> >> >>> >>    a streaming cluster, or a pure batch cluster.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Anything else that comes to your minds?
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Greetings,
>> >> >> >>> >> Stephan
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>>

Mime
View raw message