flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Flink memory usage
Date Wed, 19 Apr 2017 14:52:12 GMT
Hi Billy,

Flink's internal operators are implemented to not allocate heap space
proportional to the size of the input data.
Whenever Flink needs to hold data in memory (e.g., for sorting or building
a hash table) the data is serialized into managed memory. If all memory is
in use, Flink starts spilling to disk. This blog post discusses how Flink
uses its managed memory [1] (still up to date, even though it's almost 2
years old).
The runtime code should actually quite stable. Most of the code has been
there for several years (even before Flink was donated to the ASF) and we
haven't seen many bugs reported for the DataSet runtime. Of course this
does not mean that the code doesn't contain bugs.

However, Flink does not take care of the user code. For example a
GroupReduceFunction that collects a lot of data, e.g., in a List on the
heap, can still kill a program.

I would check if you have user functions that require lots of heap memory.
Also reducing the size of the managed memory to have more heap space
available might help.
If that doesn't solve the problem, it would be good if you could share some
details about your job (which operators, which local strategies, how many
operators) that might help to identify the misbehaving operator.

Thanks, Fabian


2017-04-19 16:09 GMT+02:00 Newport, Billy <Billy.Newport@gs.com>:

> How does Flink use memory? We’re seeing cases when running a job on larger
> datasets where it throws OOM exceptions during the job. We’re using the
> Dataset API. Shouldn’t flink be streaming from disk to disk? We workaround
> by using fewer slots but it seems unintuitive that I need to change these
> settings given Flink != Spark. Why isn’t Flinks memory usage constant? Why
> couldn’t I run a job with a single task and a single slot for any size job
> successfully other than it takes much longer to run.
> Thanks
> Billy

View raw message