flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: offheap memory allocation and memory leak bug
Date Mon, 20 Jun 2016 10:10:50 GMT
Hi,

your observation sounds like a bug to me and we have to further investigate
it. I assume that you’re running a batch job, right? Could you maybe share
your complete configuration and the job to reproduce the problem with us?

I think that your investigation that direct buffers are not properly freed
and garbage collected can be right. I will open a JIRA issue to further
investigate and solve the problem. Thanks for reporting :-)

At the moment, one way to solve this problem is, as you’ve already stated,
to set taskmanager.memory.preallocate: true in your configuration. For
batch jobs, this should actually improve the runtime performance at the
cost of a slightly longer start-up time for your TaskManagers.

Cheers,
Till
​

On Sun, Jun 19, 2016 at 6:16 PM, CPC <achalil@gmail.com> wrote:

> Hi,
>
> I think i found some information regarding this behavior.  In jvm it is
> almost imposible to free allocated memory via ByteBuffer.allocateDirect.
> There is no explicit way to say jvm "free this direct bytebuffer". In some
> forums they said you can free memory with below method:
>
>> def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
>>
>>     if(!buffers.isEmpty){
>>
>>         val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
>>
>>         cleanerMethod.setAccessible(true)
>>
>>         buffers.foreach{buffer=>
>>
>>             val cleaner = cleanerMethod.invoke(buffer)
>>
>>             val cleanMethod = cleaner.getClass().getMethod("clean")
>>
>>             cleanMethod.setAccessible(true)
>>
>>             cleanMethod.invoke(cleaner)
>>
>>         }
>>
>>     }
>>
>>     List.empty[ByteBuffer]
>>
>> }
>>
>>
> but since cleaner method is an internal method ,above  is not recommended
> and not working in every jvm and java 9 does not support it also. I also
> made some tests with above method and behavior is not predictable. If
> memory allocated by some other thread and that thread exit then it release
> memory. Actually GC controls directMemory buffers. If there is no gc
> activity and memory is allocated and then dereferenced by different threads
> memory usage goes beyond intended and machine goes to swap then os kills
> taskmanager. In my tests i saw that behaviour:
>
> Suppose that thread A allocated 8gb memory exit and there is no reference
> to allocated memory
> than thread B allocated 8gb memory exit and there is no reference to
> allocated memory
>
> when i look at direct memory usage from jvisualvm it looks like
> below(-Xmx512m -XX:MaxDirectMemorySize=12G)
>
> [image: Inline images 1]
>
> but RSS of the process is 16 GB. If i call System.gc at that point RSS
> drops to 8GB but not to expected point.
>
> This is why Apache cassandra guys select sun.misc.Unsafe(
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html
> ).
>
> I think currently only way to limit memory usage in flink if you want to
> use same taskmanager across jobs is via "taskmanager.memory.preallocate:
> true". Since it allocate memory at the beginning and not freed its memory
> usage stays constant.
>
> PS: Sorry for my english i am not a native speaker. I hope i can explain
> what i intended to :)
>
>
>
> On 18 June 2016 at 16:36, CPC <achalil@gmail.com> wrote:
>
>> Hello,
>>
>> I repeated the same test with conf values.
>>
>>> taskmanager.heap.mb: 6500
>>>
>>> taskmanager.memory.off-heap: true
>>>
>>> taskmanager.memory.fraction: 0.9
>>>
>>>
>> i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started
>> with
>>
>>> capacman 14543  323 56.0 17014744 13731328 pts/1 Sl 16:23  35:25
>>> /home/capacman/programlama/java/jdk1.7.0_75/bin/java
>>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M -Xmx650M
>>> -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m
>>> -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log
>>> -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties
>>> -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml
>>> -classpath
>>> /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar:::
>>> org.apache.flink.runtime.taskmanager.TaskManager --configDir
>>> /home/capacman/Data/programlama/flink-1.0.3/conf
>>>
>>
>> but memory usage reach up to 13Gb. Could somebodey explain me why memory
>> usage is so high? I expect it to be at most 8GB with some jvm internal
>> overhead.
>>
>> [image: Inline images 1]
>>
>> [image: Inline images 2]
>>
>> On 17 June 2016 at 20:26, CPC <achalil@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am making some test about offheap memory usage and encounter an odd
>>> behavior. My taskmanager heap limit is 12288 Mb and when i set
>>> "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off
>>> heap area at most which is heapsize*0.95(value of
>>> taskmanager.memory.fraction). But when i submit second job it allocated
>>> another 11GB and does not free memory since MaxDirectMemorySize set to
>>>  -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}"  which is
>>> TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel oom
>>> killed taskmanager. If i hit perform gc from visualvm between jobs then it
>>> release direct memory but memory usage of taskmanager in ps command is
>>> still around 20GB(RSS) and 27GB(virtual size)  in that case i could submit
>>> my test job a few times without oom killed task manager but after 10 submit
>>>  it killed taskmanager again.  I dont understand why jvm memory usage is
>>> still high even if all direct memory released. Do you have any idea? Then
>>>  i set MaxDirectMemorySize to 12 GB  in this case it freed direct memory
>>> without any explicit gc triggering from visualvm but jvm process memory
>>> usage was still high around 20GB(RSS) and 27GB(virtual size). After again
>>> maybe 10 submit it killed taskmanager. I think this is a bug and make it
>>> imposible to reuse taskmanagers without restarting them in standalone mode.
>>>
>>> [image: Inline images 1]
>>>
>>> [image: Inline images 2]
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message