flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xintong Song <tonysong...@gmail.com>
Subject Re: Flink Task Manager GC overhead limit exceeded
Date Thu, 30 Apr 2020 02:54:56 GMT
Hi Eleanore,

I'd like to explain about 1 & 2. For 3, I have no idea either.

1. I dont see the heap size from UI for task manager show correctly
>

Despite the 'heap' in the key, 'taskmanager.heap.size' accounts for the
total memory of a Flink task manager, rather than only the heap memory. A
Flink task manager process consumes not only java heap memory, but also
direct memory (e.g., network buffers) and native memory (e.g., JVM
overhead). That's why the JVM heap size shown on the UI is much smaller
than the configured 'taskmanager.heap.size'. Please refer to this document
[1] for more details. This document comes from Flink 1.9 and has not been
back-ported to 1.8, but the contents should apply to 1.8 as well.

2. I dont see the heap dump file in the restarted pod /dumps/oom.bin, did I
> set the java opts wrong?
>

The java options look good to me. It the configured path '/dumps/oom.bin' a
local path of the pod or a path of the host mounted onto the pod? The
restarted pod is a completely new different pod. Everything you write to
the old pod goes away as the pod terminated, unless they are written to the
host through mounted storage.

Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/mem_setup.html

On Thu, Apr 30, 2020 at 7:41 AM Eleanore Jin <eleanore.jin@gmail.com> wrote:

> Hi All,
>
> Currently I am running a flink job cluster (v1.8.2) on kubernetes with 4
> pods, each pod with 4 parallelism.
>
> The flink job reads from a source topic with 96 partitions, and does per
> element filter, the filtered value comes from a broadcast topic and it
> always use the latest message as the filter criteria, then publish to a
> sink topic.
>
> There is no checkpointing and state involved.
>
> Then I am seeing GC overhead limit exceeded error continuously and the
> pods keep on restarting
>
> So I tried to increase the heap size for task manager by
>
> containers:
>
>       - args:
>
>         - task-manager
>
>         - -Djobmanager.rpc.address=service-job-manager
>
>         - -Dtaskmanager.heap.size=4096m
>
>         - -Denv.java.opts.taskmanager="-XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/dumps/oom.bin"
>
>
> 3 things I noticed,
>
>
> 1. I dont see the heap size from UI for task manager show correctly
>
> [image: image.png]
>
> 2. I dont see the heap dump file in the restarted pod /dumps/oom.bin, did
> I set the java opts wrong?
>
> 3. I continously seeing below logs from all pods, not sure if causes any
> issue
> {"@timestamp":"2020-04-29T23:39:43.387Z","@version":"1","message":"[Consumer
> clientId=consumer-1, groupId=aba774bc] Node 6 was unable to process the
> fetch request with (sessionId=2054451921, epoch=474):
> FETCH_SESSION_ID_NOT_FOUND.","logger_name":"org.apache.kafka.clients.FetchSessionHandler","thread_name":"pool-6-thread-1","level":"INFO","level_value":20000}
>
> Thanks a lot for any help!
>
> Best,
> Eleanore
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message