flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [DISCUSS] Dedicated streaming mode
Date Tue, 26 May 2015 05:34:01 GMT
Thanks Aljoscha and Stephan, this helps

- Henry

On Fri, May 22, 2015 at 4:37 AM, Stephan Ewen <sewen@apache.org> wrote:
> Aljoscha is right. There are plans to migrate the streaming state to the
> MemoryManager as well, but streaming state is not managed at this point.
>
> What is managed in streaming jobs is the data buffered and cached in the
> network stack. But that is a different memory pool than the memory manager.
> We keep those pools separate because the network stack is currently more
> advanced in terms of dynamically rebalancing memory, compared to the memory
> manager.
>
> On Fri, May 22, 2015 at 12:25 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
>> Hi,
>> streaming currently does not use any memory manager. All state is kept
>> in Java Objects on the Java Heap, for example an ArrayList<> for the
>> window buffer.
>>
>> On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <henry.saputra@gmail.com>
>> wrote:
>> > Hi Stephan, Gyula, Paris,
>> >
>> > How does streaming currently different in term of memory management?
>> > Currently we only have one MemoryManager which is used by both modes I
>> > believe.
>> >
>> > - Henry
>> >
>> > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <sewen@apache.org> wrote:
>> >> I discussed a bit via Skype with Gyula and Paris.
>> >>
>> >>
>> >> We thought about the following way to do it:
>> >>
>> >>  - We add a dedicated streaming mode for now. The streaming mode
>> supersedes
>> >> the batch mode, so it can run both type of programs.
>> >>
>> >>  - The streaming mode sets the memory manager to "lazy allocation".
>> >>     -> So long as it runs pure streaming jobs, the full heap will be
>> >> available to window buffers and UDFs.
>> >>     -> Batch programs can still run, so mixed workloads are not
>> prevented.
>> >> Batch programs are a bit less robust there, because the memory manager
>> does
>> >> not pre-allocate memory. UDFs can eat into Flink's memory portion.
>> >>
>> >>  - The streaming mode starts the necessary configured
>> components/services
>> >> for state backups
>> >>
>> >>
>> >>
>> >> Over the next versions, we want to bring these things together:
>> >>   - use the managed memory for window buffers
>> >>   - on-demand starting of the state backend
>> >>
>> >> Then, we deprecate the streaming mode, let both modes start the cluster
>> in
>> >> the same way.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljoscha@apache.org>
>> >> wrote:
>> >>
>> >>> Would it not be possible to start the snapshot service once the user
>> >>> starts the first streaming job? About 2) with checkpointing coming up,
>> >>> would it not make sense to shift to managed memory rather sooner than
>> >>> later. Then this point would become moot.
>> >>>
>> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
>> >>> <mjsax@informatik.hu-berlin.de> wrote:
>> >>> > What would be the consequences on "mixed" programs? (If there is
any
>> >>> > plan to support those?)
>> >>> >
>> >>> > Would it be necessary to have a third mode? Or would those programs
>> >>> > simple run in streaming mode?
>> >>> >
>> >>> > -Matthias
>> >>> >
>> >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
>> >>> >> Hi all!
>> >>> >>
>> >>> >> We discussed a while back about introducing a dedicated streaming
>> mode
>> >>> for
>> >>> >> Flink. I would like to take a go at this and implement the
changes,
>> but
>> >>> >> discuss them before.
>> >>> >>
>> >>> >>
>> >>> >> Here is a brief summary why we wanted to introduce the dedicated
>> >>> streaming
>> >>> >> mode:
>> >>> >> Even though both batch and streaming are executed by the same
>> execution
>> >>> >> engine,
>> >>> >> a streaming setup of Flink varies a bit from a batch setup:
>> >>> >>
>> >>> >> 1) The streaming cluster starts an additional service to store
the
>> >>> >> distributed state snapshots.
>> >>> >>
>> >>> >> 2) Streaming mode uses memory a bit different, so we should
>> configure
>> >>> the
>> >>> >> memory manager differently. This difference may eventually
go away.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> Concretely, to implement this, I was thinking about introducing
the
>> >>> >> following externally visible changes
>> >>> >>
>> >>> >>  - Additional scripts "start-streaming-cluster.sh" and
>> >>> >> "start-streaming-local.sh"
>> >>> >>
>> >>> >>  - An execution mode parameter for the TaskManager ("batch
/
>> streaming")
>> >>> >>
>> >>> >>  - An execution mode parameter for the JobManager TaskManager
>> ("batch /
>> >>> >> streaming")
>> >>> >>
>> >>> >>  - All local executors and mini clusters need a flag that specifies
>> >>> whether
>> >>> >> they will start
>> >>> >>    a streaming cluster, or a pure batch cluster.
>> >>> >>
>> >>> >>
>> >>> >> Anything else that comes to your minds?
>> >>> >>
>> >>> >>
>> >>> >> Greetings,
>> >>> >> Stephan
>> >>> >>
>> >>> >
>> >>>
>>

Mime
View raw message