flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zahid Rahman <zahidr1...@gmail.com>
Subject Re: Flink 1.10 Out of memory
Date Fri, 24 Apr 2020 11:15:07 GMT
https://youtu.be/UEkjRN8jRx4  22:10


-  one  option is to reduce flink managed memory from default  70% to may
be 50%.

 - This error could be caused also  due to missing memory ;

- maintaining a local list by programmer so over using user allocated
memory caused by heavy processing ;

 - or using a small jvm ,

- Or JVM  spends too much time on gc.

Out of memory has nothing to do flink or flink is not at fault.


This process is known as "pimping" flink.

also part of pimping is use to use local disk for memory spill.

On Fri, 24 Apr 2020, 03:53 Xintong Song, <tonysong820@gmail.com> wrote:

> @Stephan,
> I don't think so. If JVM hits the direct memory limit, you should see the
> error message "OutOfMemoryError: Direct buffer memory".
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Apr 23, 2020 at 6:11 PM Stephan Ewen <sewen@apache.org> wrote:
>
>> @Xintong and @Lasse could it be that the JVM hits the "Direct Memory"
>> limit here?
>> Would increasing the "taskmanager.memory.framework.off-heap.size" help?
>>
>> On Mon, Apr 20, 2020 at 11:02 AM Zahid Rahman <zahidr1000@gmail.com>
>> wrote:
>>
>>> As you can see from the task manager tab of flink web dashboard
>>>
>>> Physical Memory:3.80 GB
>>> JVM Heap Size:1.78 GB
>>> Flink Managed Memory:128 MB
>>>
>>> *Flink is only using 128M MB which can easily cause OOM*
>>> *error.*
>>>
>>> *These are DEFAULT settings.*
>>>
>>> *I dusted off an old laptop so it only 3.8 GB RAM.*
>>>
>>> What does your job metrics say  ?
>>>
>>> On Mon, 20 Apr 2020, 07:26 Xintong Song, <tonysong820@gmail.com> wrote:
>>>
>>>> Hi Lasse,
>>>>
>>>> From what I understand, your problem is that JVM tries to fork some
>>>> native process (if you look at the exception stack the root exception is
>>>> thrown from a native method) but there's no enough memory for doing that.
>>>> This could happen when either Mesos is using cgroup strict mode for memory
>>>> control, or there's no more memory on the machine. Flink cannot prevent
>>>> native processes from using more memory. It can only reserve certain amount
>>>> of memory for such native usage when requesting worker memory from the
>>>> deployment environment (in your case Mesos) and allocating Java heap /
>>>> direct memory.
>>>>
>>>> My suggestion is to try increasing the JVM overhead configuration. You
>>>> can leverage the configuration options
>>>> 'taskmanager.memory.jvm-overhead.[min|max|fraction]'. See more details in
>>>> the documentation[1].
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-overhead-max
>>>>
>>>> On Sat, Apr 18, 2020 at 4:02 AM Zahid Rahman <zahidr1000@gmail.com>
>>>> wrote:
>>>>
>>>>> https://betsol.com/java-memory-management-for-java-virtual-machine-jvm/
>>>>>
>>>>> Backbutton.co.uk
>>>>> ¯\_(ツ)_/¯
>>>>> ♡۶Java♡۶RMI ♡۶
>>>>> Make Use Method {MUM}
>>>>> makeuse.org
>>>>> <http://www.backbutton.co.uk>
>>>>>
>>>>>
>>>>> On Fri, 17 Apr 2020 at 14:07, Lasse Nedergaard <
>>>>> lassenedergaardflink@gmail.com> wrote:
>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> We have migrated to Flink 1.10 and face out of memory exception and
>>>>>> hopeful can someone point us in the right direction.
>>>>>>
>>>>>> We have a job that use broadcast state, and we sometimes get out
>>>>>> memory when it creates a savepoint. See stacktrack below.
>>>>>> We have assigned 2.2 GB/task manager and
>>>>>> configured  taskmanager.memory.process.size : 2200m
>>>>>> In Flink 1.9 our container was terminated because OOM, so 1.10 do
a
>>>>>> better job, but it still not working and the task manager is leaking
mem
>>>>>> for each OOM and finial kill by Mesos
>>>>>>
>>>>>>
>>>>>> Any idea what we can do to figure out what settings we need to change?
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> Lasse Nedergaard
>>>>>>
>>>>>>
>>>>>> WARN o.a.flink.runtime.state.filesystem.FsCheckpointStreamFactory
-
>>>>>> Could not close the state stream for
>>>>>> s3://flinkstate/dcos-prod/checkpoints/fc9318cc236d09f0bfd994f138896d6c/chk-3509/cf0714dc-ad7c-4946-b44c-96d4a131a4fa.
>>>>>> java.io.IOException: Cannot allocate memory at
>>>>>> java.io.FileOutputStream.writeBytes(Native Method) at
>>>>>> java.io.FileOutputStream.write(FileOutputStream.java:326) at
>>>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at
>>>>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at
>>>>>> java.io.FilterOutputStream.flush(FilterOutputStream.java:140) at
>>>>>> java.io.FilterOutputStream.close(FilterOutputStream.java:158) at
>>>>>> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:995)
>>>>>> at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>>>>>> at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>>>>>> at
>>>>>> org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
>>>>>> at
>>>>>> org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:277)
>>>>>> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263) at
>>>>>> org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250) at
>>>>>> org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:167)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:83)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>
>>>>>> INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator -
>>>>>> Discarding checkpoint 3509 of job fc9318cc236d09f0bfd994f138896d6c.
>>>>>> org.apache.flink.util.SerializedThrowable: Could not materialize
checkpoint
>>>>>> 3509 for operator Feature extraction (8/12). at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>> at java.lang.Thread.run(Thread.java:748) Caused by:
>>>>>> org.apache.flink.util.SerializedThrowable: java.io.IOException: Cannot
>>>>>> allocate memory at
>>>>>> java.util.concurrent.FutureTask.report(FutureTask.java:122) at
>>>>>> java.util.concurrent.FutureTask.get(FutureTask.java:192) at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
>>>>>> ... 3 common frames omitted Caused by:
>>>>>> org.apache.flink.util.SerializedThrowable: Cannot allocate memory
at
>>>>>> java.io.FileOutputStream.writeBytes(Native Method) at
>>>>>> java.io.FileOutputStream.write(FileOutputStream.java:326) at
>>>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at
>>>>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
at
>>>>>> java.io.FilterOutputStream.write(FilterOutputStream.java:77) at
>>>>>> java.io.FilterOutputStream.write(FilterOutputStream.java:125) at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>>>>>> at java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>>>>> org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.write(HadoopDataOutputStream.java:47)
>>>>>> at
>>>>>> org.apache.flink.core.fs.FSDataOutputStreamWrapper.write(FSDataOutputStreamWrapper.java:66)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:220)
>>>>>> at java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>>>>> org.apache.flink.formats.avro.utils.DataOutputEncoder.writeBytes(DataOutputEncoder.java:92)
>>>>>> at
>>>>>> org.apache.flink.formats.avro.utils.DataOutputEncoder.writeString(DataOutputEncoder.java:113)
>>>>>> at org.apache.avro.io.Encoder.writeString(Encoder.java:130) at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:323)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:281)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:144)
>>>>>> at
>>>>>> org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:98)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
>>>>>> at
>>>>>> org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:83)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:234)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:136)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:144)
>>>>>> at
>>>>>> org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:98)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
>>>>>> at
>>>>>> org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:83)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>>>> at
>>>>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>>>>>> at
>>>>>> org.apache.flink.formats.avro.typeutils.AvroSerializer.serialize(AvroSerializer.java:185)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.HeapBroadcastState.write(HeapBroadcastState.java:109)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:167)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
>>>>>> at
>>>>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
>>>>>>
>>>>>

Mime
View raw message