flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: GC on taskmanagers
Date Tue, 31 Mar 2015 09:29:30 GMT
Hi Emmanuel,

In Java, the garbage collector will always run periodically. So remotely
executing it won't make any difference.

If you want to reuse the existing Java process without restarting it, you
have to stop the program code from executing which is causing the
OutOfMemoryError. Usually, this is quite tricky because your program might
not even accept input any more because it is constantly occupied with the
garbage collection.

Where was the OutOfMemoryError thrown? Do you have the stack trace of the
error? From the task manager stack trace, it actually looks like your
program is not executing any more. I would try executing a demo program
(e.g. WordCount) to check your setup.

Best regards,
Max

On Tue, Mar 31, 2015 at 5:44 AM, Emmanuel <eleroy@msn.com> wrote:

> My Java is still rusty and I often run into OutOfMemoryError: GC overhead
> exceeded...
>
> Yes, I need to look for memory leaks...
>
> But first I need to clear up this memory so I can run again without having
> to shut down and restart everything.
>
> I've tried using the jcmd <pid> GC.run command on eachof the JVM
> instances on a taskmanager but I get a boat load of output like this:
>
> On the host running the command:
> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket
> file: target process not responding or HotSpot VM not loaded
> at
> sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
> at
> sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
> at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:213)
> at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:140)
> at sun.tools.jcmd.JCmd.main(JCmd.java:129)
>
>
>
> and on the taskmanager log:
>
> "Flink-IPC Server handler 1 on 6121" daemon prio=10 tid=0x00007f5f107ee000
> nid=0x8f waiting on condition [0x00007f5eb4803000]
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00000000f37e95c0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:941)
>
> "Flink-IPC Server handler 0 on 6121" daemon prio=10 tid=0x00007f5f107eb800
> nid=0x8e waiting on condition [0x00007f5eb4904000]
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00000000f37e95c0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:941)
>
> "Flink-IPC Server listener on 6121" daemon prio=10 tid=0x00007f5f107e9800
> nid=0x8d runnable [0x00007f5eb4a05000]
>    java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
> - locked <0x00000000f385d3c0> (a sun.nio.ch.Util$2)
> - locked <0x00000000f385d3d0> (a java.util.Collections$UnmodifiableSet)
> - locked <0x00000000f385d378> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
> at org.apache.flink.runtime.ipc.Server$Listener.run(Server.java:341)
>
> "Flink-IPC Server Responder" daemon prio=10 tid=0x00007f5f107e8800
> nid=0x8c runnable [0x00007f5eb4b06000]
>    java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
> - locked <0x00000000f387b528> (a sun.nio.ch.Util$2)
> - locked <0x00000000f387b538> (a java.util.Collections$UnmodifiableSet)
> - locked <0x00000000f387b4e0> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
> at org.apache.flink.runtime.ipc.Server$Responder.run(Server.java:506)
>
> "Service Thread" daemon prio=10 tid=0x00007f5f100c2000 nid=0x8a runnable
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>
> "C2 CompilerThread1" daemon prio=10 tid=0x00007f5f100c0000 nid=0x89
> waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>
> "C2 CompilerThread0" daemon prio=10 tid=0x00007f5f100bd000 nid=0x88
> waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>
> "Signal Dispatcher" daemon prio=10 tid=0x00007f5f100b3000 nid=0x87 waiting
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>
> "Finalizer" daemon prio=10 tid=0x00007f5f1009c800 nid=0x86 in
> Object.wait() [0x00007f5eb605b000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000f381cc08> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> - locked <0x00000000f381cc08> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
>
> "Reference Handler" daemon prio=10 tid=0x00007f5f10098800 nid=0x85 in
> Object.wait() [0x00007f5eb615c000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000f381c820> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:503)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> - locked <0x00000000f381c820> (a java.lang.ref.Reference$Lock)
>
> "main" prio=10 tid=0x00007f5f1000d800 nid=0x6a in Object.wait()
> [0x00007f5f178d4000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000fbe14200> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:503)
> at
> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.java:1115)
> - locked <0x00000000fbe14200> (a java.lang.Object)
>
> "VM Thread" prio=10 tid=0x00007f5f10096000 nid=0x84 runnable
>
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f5f10023000 nid=0x6b
> runnable
>
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f5f10025000 nid=0x6c
> runnable
>
> "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f5f10027000 nid=0x6d
> runnable
>
> "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f5f10029000 nid=0x6e
> runnable
>
> "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007f5f1002a800 nid=0x6f
> runnable
>
> "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007f5f1002c800 nid=0x70
> runnable
>
> "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007f5f1002e800 nid=0x71
> runnable
>
> "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007f5f10030000 nid=0x72
> runnable
>
> "GC task thread#8 (ParallelGC)" prio=10 tid=0x00007f5f10032000 nid=0x73
> runnable
>
> "GC task thread#9 (ParallelGC)" prio=10 tid=0x00007f5f10034000 nid=0x74
> runnable
>
> "GC task thread#10 (ParallelGC)" prio=10 tid=0x00007f5f10036000 nid=0x75
> runnable
>
> "GC task thread#11 (ParallelGC)" prio=10 tid=0x00007f5f10037800 nid=0x76
> runnable
>
> "GC task thread#12 (ParallelGC)" prio=10 tid=0x00007f5f10039800 nid=0x77
> runnable
>
> "GC task thread#13 (ParallelGC)" prio=10 tid=0x00007f5f1003b800 nid=0x78
> runnable
>
> "GC task thread#14 (ParallelGC)" prio=10 tid=0x00007f5f1003d000 nid=0x79
> runnable
>
> "GC task thread#15 (ParallelGC)" prio=10 tid=0x00007f5f1003f000 nid=0x7a
> runnable
>
> "GC task thread#16 (ParallelGC)" prio=10 tid=0x00007f5f10041000 nid=0x7b
> runnable
>
> "GC task thread#17 (ParallelGC)" prio=10 tid=0x00007f5f10043000 nid=0x7c
> runnable
>
> "GC task thread#18 (ParallelGC)" prio=10 tid=0x00007f5f10044800 nid=0x7d
> runnable
>
> "GC task thread#19 (ParallelGC)" prio=10 tid=0x00007f5f10046800 nid=0x7e
> runnable
>
> "GC task thread#20 (ParallelGC)" prio=10 tid=0x00007f5f10048800 nid=0x7f
> runnable
>
> "GC task thread#21 (ParallelGC)" prio=10 tid=0x00007f5f1004a000 nid=0x80
> runnable
>
> "GC task thread#22 (ParallelGC)" prio=10 tid=0x00007f5f1004c000 nid=0x81
> runnable
>
> "VM Periodic Task Thread" prio=10 tid=0x00007f5f100d5000 nid=0x8b waiting
> on condition
>
> JNI global references: 530
>
> Heap
>  PSYoungGen      total 76800K, used 63133K [0x00000000faa80000,
> 0x0000000100000000, 0x0000000100000000)
>   eden space 66048K, 95% used
> [0x00000000faa80000,0x00000000fe827690,0x00000000feb00000)
>   from space 10752K, 0% used
> [0x00000000ff580000,0x00000000ff580000,0x0000000100000000)
>   to   space 10752K, 0% used
> [0x00000000feb00000,0x00000000feb00000,0x00000000ff580000)
>  ParOldGen       total 175104K, used 175046K [0x00000000eff80000,
> 0x00000000faa80000, 0x00000000faa80000)
>   object space 175104K, 99% used
> [0x00000000eff80000,0x00000000faa71bb0,0x00000000faa80000)
>  PSPermGen       total 29696K, used 29267K [0x00000000dff80000,
> 0x00000000e1c80000, 0x00000000eff80000)
>   object space 29696K, 98% used
> [0x00000000dff80000,0x00000000e1c14d38,0x00000000e1c80000)
>
>
>
>
>
> Any insight on clearing GC cleanly when this happens?
>
> THanks!
>
>
>

Mime
View raw message