Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <7d57ff3aa12b4d22899f65482c515128@gsdgamp19etn2.firmwide.corp.gs.com>
References: <ebde088b4ffa47a581df80870847f1ce@gsdgamp19etn2.firmwide.corp.gs.com>
 <CAAdrtT2XAPRGFH1O32=cryN9dv5beUb2ceOvJg==Y=mY7WbSWw@mail.gmail.com>
 <f9ed86d029b949e5ae4c986fd1e89c9b@gsdgamp19etn2.firmwide.corp.gs.com>
 <633F19C6003CFF44891E456BB8269E55014EC4FC@lhreml505-mbs.china.huawei.com>
 <a2f6d32fb1a84328a8b0e8a99d42a34d@gsdgamp19etn2.firmwide.corp.gs.com>
 <633F19C6003CFF44891E456BB8269E55014EC565@lhreml505-mbs.china.huawei.com>
 <CAC27z=O9-O4y_D8y8mc8W8W9A=-XrgD0L8YZj2TkV5j2HW85ag@mail.gmail.com> <7d57ff3aa12b4d22899f65482c515128@gsdgamp19etn2.firmwide.corp.gs.com>
From: =?UTF-8?B?R8OhYm9yIEfDqXZheQ==?= <ggab90@gmail.com>
Date: Thu, 20 Apr 2017 18:22:02 +0200
Message-ID: <CADXjeyBUvpWgZ8AbaXRmJZZO1jEvu-H6h0inK9jsRpX4Wmiodw@mail.gmail.com>
Subject: Re: Flink memory usage
To: "Newport, Billy" <Billy.Newport@gs.com>
Cc: "user@flink.apache.org" <user@flink.apache.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 20 Apr 2017 16:22:51 -0000

Hello,

You could also try using a profiler that shows what objects are using
what amount of memory. E.g., JProfiler or Java Flight Recorder [1].

Best,
G=C3=A1bor

[1] https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/mem=
leaks001.html


On Thu, Apr 20, 2017 at 6:00 PM, Newport, Billy <Billy.Newport@gs.com> wrot=
e:
> Ok
>
> The concensus seems to be that it=E2=80=99s us not Flink J So we=E2=80=99=
ll look harder at
> what we=E2=80=99re doing in case there is anything silly. We are using 16=
K network
> buffers BTW which is around 0.5GB with the defaults.
>
>
>
>
>
> From: Till Rohrmann [mailto:trohrmann@apache.org]
> Sent: Thursday, April 20, 2017 11:52 AM
> To: Stefano Bortoli
> Cc: Newport, Billy [Tech]; Fabian Hueske; user@flink.apache.org
>
>
> Subject: Re: Flink memory usage
>
>
>
> Hi Billy,
>
>
>
> if you didn't split the different data sets up into different slot sharin=
g
> groups, then your maximum parallelism is 40. Thus, it should be enough to
> assign 40^2 * 20 * 4 =3D 128000 network buffers. If that is not enough be=
cause
> you have more than 4 shuffling steps in parallel running then you have to
> increase the last term.
>
>
>
> OOM exceptions should actually only occur due to user code objects. Given
> that you have reserved a massive amount of memory for the network buffers
> the remaining heap for the user code is probably very small. Try whether =
you
> can decrease the number of network buffers. Moreover, check whether your
> user code keeps somewhere references to objects which could cause the OOM=
.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Thu, Apr 20, 2017 at 5:42 PM, Stefano Bortoli
> <stefano.bortoli@huawei.com> wrote:
>
> I think that if you have a lot of memory available, the GC gets kind of
> lazy. In our case, the issue was just the latency caused by the GC, cause=
 we
> were loading more data than it could fit in memory. Hence optimizing the
> code gave us a lot of improvements. FlatMaps are also dangerous as object=
s
> can multiply beyond expected, making co-group extremely costly. :-) A
> distinct() well placed saves a lot of time and memory.
>
>
>
> My point is that having worked with scarce resources I learned that almos=
t
> all the time the issue was my code, not the framework.
>
>
>
> Good luck.
>
>
>
> Stefano
>
>
>
> From: Newport, Billy [mailto:Billy.Newport@gs.com]
> Sent: Thursday, April 20, 2017 4:46 PM
> To: Stefano Bortoli <stefano.bortoli@huawei.com>; 'Fabian Hueske'
> <fhueske@gmail.com>
>
>
> Cc: 'user@flink.apache.org' <user@flink.apache.org>
> Subject: RE: Flink memory usage
>
>
>
> Your reuse idea kind of implies that it=E2=80=99s a GC generation rate is=
sue, i.e.
> it=E2=80=99s not collecting fast enough so it=E2=80=99s running out of me=
mory versus heap
> that=E2=80=99s actually anchored, right?
>
>
>
>
>
> From: Stefano Bortoli [mailto:stefano.bortoli@huawei.com]
> Sent: Thursday, April 20, 2017 10:33 AM
> To: Newport, Billy [Tech]; 'Fabian Hueske'
> Cc: 'user@flink.apache.org'
> Subject: RE: Flink memory usage
>
>
>
> Hi Billy,
>
>
>
> The only suggestion I can give is to check very well in your code for
> useless variable allocations, and foster reuse as much as possible. Don=
=E2=80=99t
> create a new collection at any map execution, but rather clear, reuse the
> collected output of the flatMap, and so on.  In the past we run long proc=
ess
> of lot of data and small memory without problems. Many more complex
> co-group, joins and so on without any issue.
>
>
>
> My2c. Hope it helps.
>
>
>
> Stefano
>
>
>
> From: Newport, Billy [mailto:Billy.Newport@gs.com]
> Sent: Thursday, April 20, 2017 1:31 PM
> To: 'Fabian Hueske' <fhueske@gmail.com>
> Cc: 'user@flink.apache.org' <user@flink.apache.org>
> Subject: RE: Flink memory usage
>
>
>
> I don=E2=80=99t think our function are memory heavy they typically are co=
groups and
> merge the records on the left with the records on the right.
>
>
>
> We=E2=80=99re currently requiring 720GB of heap to do our processing whic=
h frankly
> appears ridiculous to us. Could too much parallelism be causing the probl=
em?
> Looking at:
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Optim=
al-Configuration-for-Cluster-td5024.html
>
>
>
> If we are processing 17 =E2=80=9Cdatasets=E2=80=9D in a single job and ea=
ch has an
> individual parallelism of 40 is that a total parallelism (potential) of
> 17*40 and given your network buffers calculation of parallelism squared,
> would that do it or only if we explicitly configure it that way:
>
>
>
> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>
>
> where p is the maximum parallelism of the job and t is the number of task
> manager.
>
> You can process more than one parallel task per TM if you configure more
> than one processing slot per machine ( taskmanager.numberOfTaskSlots). Th=
e
> TM will divide its memory among all its slots. So it would be possible to
> start one TM for each machine with 100GB+ memory and 48 slots each.
>
>
>
> Our pipeline for each dataset looks like this:
>
>
>
> Read avro file -> FlatMap -> Validate each record with a flatmap ->
>
> Read Parquet -> FlatMap -> Filter Live Rows -> CoGroup with the validated
> avro file above -> }
>
> Read Parquet -> FlatMap -> Filter Dead Rows
> ----------------------------------=C3=A0  } Union cogroup with dead rows =
and
> write result to parquet file.
>
>
>
> I don=E2=80=99t understand why this logic couldn=E2=80=99t run with a sin=
gle task manager
> and just take longer. We=E2=80=99re having a lot of trouble trying to cha=
nge the
> tuning to reduce the memory burn. We run the above pipeline with parallel=
ism
> 40 for all 17 datasets in a single job.
>
>
>
> We=E2=80=99re running this config now which is not really justifiable for=
 what we=E2=80=99re
> doing.
>
>
>
> 20 nodes 2 slots, 40 parallelism 36GB mem =3D 720GB of heap=E2=80=A6
>
>
>
> Thanks
>
>
>
> From: Fabian Hueske [mailto:fhueske@gmail.com]
> Sent: Wednesday, April 19, 2017 10:52 AM
> To: Newport, Billy [Tech]
> Cc: user@flink.apache.org
> Subject: Re: Flink memory usage
>
>
>
> Hi Billy,
>
> Flink's internal operators are implemented to not allocate heap space
> proportional to the size of the input data.
>
> Whenever Flink needs to hold data in memory (e.g., for sorting or buildin=
g a
> hash table) the data is serialized into managed memory. If all memory is =
in
> use, Flink starts spilling to disk. This blog post discusses how Flink us=
es
> its managed memory [1] (still up to date, even though it's almost 2 years
> old).
>
> The runtime code should actually quite stable. Most of the code has been
> there for several years (even before Flink was donated to the ASF) and we
> haven't seen many bugs reported for the DataSet runtime. Of course this d=
oes
> not mean that the code doesn't contain bugs.
>
>
>
> However, Flink does not take care of the user code. For example a
> GroupReduceFunction that collects a lot of data, e.g., in a List on the
> heap, can still kill a program.
>
> I would check if you have user functions that require lots of heap memory=
.
>
> Also reducing the size of the managed memory to have more heap space
> available might help.
>
> If that doesn't solve the problem, it would be good if you could share so=
me
> details about your job (which operators, which local strategies, how many
> operators) that might help to identify the misbehaving operator.
>
>
>
> Thanks, Fabian
>
>
> [1]
> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.htm=
l
>
>
>
> 2017-04-19 16:09 GMT+02:00 Newport, Billy <Billy.Newport@gs.com>:
>
> How does Flink use memory? We=E2=80=99re seeing cases when running a job =
on larger
> datasets where it throws OOM exceptions during the job. We=E2=80=99re usi=
ng the
> Dataset API. Shouldn=E2=80=99t flink be streaming from disk to disk? We w=
orkaround
> by using fewer slots but it seems unintuitive that I need to change these
> settings given Flink !=3D Spark. Why isn=E2=80=99t Flinks memory usage co=
nstant? Why
> couldn=E2=80=99t I run a job with a single task and a single slot for any=
 size job
> successfully other than it takes much longer to run.
>
>
>
> Thanks
>
> Billy
>
>
>
>
>
>
>
>