flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Crash in a simple "mapper style" streaming app likely due to a memory leak ?
Date Fri, 13 Nov 2015 14:57:40 GMT
Hi Arnaud,

your M2 mapper is allocating memory on the JVM heap. That should not cause
any issues, because the heap is limited to 9.2GB anyways. The problem are
offheap allocations.
The RichSinkFunction is only instantiated once per slot, yes.

You can set application specific parameters using
"-Dyarn.heap-cutoff-ratio=0.5".

Regards,
Robert





On Fri, Nov 13, 2015 at 3:49 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
wrote:

> Hi Robert,
>
>
>
> Thanks, it works with 50% -- at least way past the previous crash point.
>
>
>
> In my opinion (I lack real metrics), the part that uses the most memory is
> the M2 mapper, instantiated once per slot.
>
> The most complex part is the Sink (it does use a lot of hdfs files,
> flushing threads etc.) ; but I expect the “RichSinkFunction” to be
> instantiated only once per slot ? I’m really surprised by that memory
> usage, I will try using a monitoring app on the yarn jvm to understand.
>
>
>
> How do I set this yarn.heap-cutoff-ratio  parameter for a specific
> application ? I don’t want to modify the “root-protected” flink-conf.yaml
> for all the users & flink jobs with that value.
>
>
>
> Regards,
>
> Arnaud
>
>
>
> *De :* Robert Metzger [mailto:rmetzger@apache.org]
> *Envoyé :* vendredi 13 novembre 2015 15:16
> *À :* user@flink.apache.org
> *Objet :* Re: Crash in a simple "mapper style" streaming app likely due
> to a memory leak ?
>
>
>
> Hi Arnaud,
>
>
>
> can you try running the job again with the configuration value
> of "yarn.heap-cutoff-ratio" set to 0.5.
>
> As you can see, the container has been killed because it used more than 12
> GB : "12.1 GB of 12 GB physical memory used;"
> You can also see from the logs, that we limit the JVM Heap space to 9.2GB:
> "java -Xms9216m -Xmx9216m"
>
>
>
> In an ideal world, we would tell the JVM to limit its memory usage to 12
> GB, but sadly, the heap space is not the only memory the JVM is allocating.
> Its allocating direct memory, and other stuff outside. Therefore, we use
> only 75% of the container memory to the heap.
>
> In your case, I assume that each JVM is having multiple HDFS clients, a
> lot of local threads etc.... that's why the memory might not suffice.
>
> With a cutoff ratio of 0.5, we'll only use 6 GB for the heap.
>
>
>
> That value might be a bit too high .. but I want to make sure that we
> first identify the issue.
>
> If the job is running with 50% cutoff, you can try to reduce it again
> towards 25% (that's the default value, unlike the documentation says).
>
>
>
> I hope that helps.
>
>
>
> Regards,
>
> Robert
>
>
>
>
>
> On Fri, Nov 13, 2015 at 2:58 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
> wrote:
>
> Hello,
>
>
>
> I use the brand new 0.10 version and I have problems running a streaming
> execution. My topology is linear : a custom source SC scans a directory and
> emits hdfs file names ; a first mapper M1 opens the file and emits its
> lines ; a filter F filters lines ; another mapper M2 transforms them ; and
> a mapper/sink M3->SK stores them in HDFS.
>
>
>
> SC->M1->F->M2->M3->SK
>
>
>
> The M2 transformer uses a bit of RAM because when it opens it loads a 11M
> row static table inside a hash map to enrich the lines. I use 55 slots on
> Yarn, using 11 containers of 12Gb x 5 slots
>
>
>
> To my understanding, I should not have any memory problem since each
> record is independent : no join, no key, no aggregation, no window => it’s
> a simple flow mapper, with RAM simply used as a buffer. However, if I
> submit enough input data, I systematically crash my app with “Connection
> unexpectedly closed by remote task manager” exception, and the first error
> in YARN log shows that “a container is running beyond physical memory
> limits”.
>
>
>
> If I increase the container size, I simply need to feed in more data to
> get the crash happen.
>
>
>
> Any idea?
>
>
>
> Greetings,
>
> Arnaud
>
>
>
> _________________________________
>
> Exceptions in Flink dashboard detail :
>
>
>
> Root Exception :
>
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager 'bt1shli6/
> 172.21.125.31:33186'. This might indicate that the remote task manager
> was lost.
>
>        at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>
> (…)
>
>
> ------------------------------
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le détruire et
> d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.
>

Mime
View raw message