flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florin Dinu <florin.d...@epfl.ch>
Subject Re: the design of spilling to disk
Date Tue, 19 Sep 2017 16:19:38 GMT
Hi Kostas,

Thank you for the quick reply and the tips. I will check them out !

I would like to start by understanding the way secondary storage is used in batch processing.

If you guys have additional pointers on that, it would certainly help me a lot.

Thanks again,


From: Kostas Kloudas <k.kloudas@data-artisans.com>
Sent: Tuesday, September 19, 2017 18:10
To: Florin Dinu
Cc: user@flink.apache.org; fhueske@apache.org
Subject: Re: the design of spilling to disk

Hi Florin,

Unfortunately, there is no design document.

The UnilateralSortMerger.java is used in the batch processing mode (not is streaming) and,
in fact, the code dates some years back. I cc also Fabian as he may have more things to say
on this.

Now for the streaming side, Flink uses 3 state-backends, in-memory (no spilling and mainly
useful for testing),
filesystem and RocksDB (both eventually spill to disk but in different ways), and it also
supports incremental
checkpoints, i.e. at each checkpoint it only stores the diff between checkpoint[i] and checkpoint[i-1].

For more information on Flink state and state backends, checkout the latest talk from Stefan
Richter at
Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM) and the .


On Sep 19, 2017, at 6:00 PM, Florin Dinu <florin.dinu@epfl.ch<mailto:florin.dinu@epfl.ch>>

Hello everyone,

In our group at EPFL we're doing research on understanding and potentially improving the performance
of data-parallel frameworks that use secondary storage.
I was looking at the Flink code to understand how spilling to disk actually works.
So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw
there are some spilling markers used.
I am curious if there is any design document available on this topic.
I was not able to find much online.
If there is no such design document I would appreciate if someone could help me understand
how these spilling markers are used.
At a higher level, I am trying to understand how much data does Flink spill to disk after
it has concluded that it needs to spill to disk.

Thank you very much
Florin Dinu

View raw message