brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (BROOKLYN-214) OutOfMemoryError (too many threads): repeated calls to AttributeWhenReady
Date Wed, 13 Jan 2016 10:45:39 GMT

     [ https://issues.apache.org/jira/browse/BROOKLYN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aled Sage resolved BROOKLYN-214.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 0.9.0

> OutOfMemoryError (too many threads): repeated calls to AttributeWhenReady
> -------------------------------------------------------------------------
>
>                 Key: BROOKLYN-214
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-214
>             Project: Brooklyn
>          Issue Type: Bug
>            Reporter: Aled Sage
>             Fix For: 0.9.0
>
>
> When launching Clocker, an {{OutOfMemoryError}} was encountered due to too many threads.
The underlying cause is repeated task execution to {{AttributeWhenReady}}, where each task
blocks a thread.
> The exception encountered was:
> {noformat}
> 2016-01-11 16:36:32,460 DEBUG o.a.b.u.c.t.BasicExecutionManager [brooklyn-execmanager-vzwdtuv4-5490]:
Exception running task Task[machine.loadAverage @ h2jAHTjo <- ssh[uptime->machine.loadAverage]:LBUslVfG]
(rethrowing): unable to
>  create new native thread
> java.lang.OutOfMemoryError: unable to create new native thread
> {noformat}
> Shortly before the OOME, this was the resource usage:
> {noformat}
> 2016-01-11 16:36:26,884 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn
gc (after) - using 202 MB / 310 MB memory (122 kB soft); 1987 threads; storage: {datagrid={size=7,
createCount=7}, refsMapSize=0, listsMapS
> ize=0}; tasks: 1835 active, 1040 unfinished; 1425 remembered, 169790 total submitted)
> {noformat}
> Looking at a thread dump, there are 977 threads waiting for a lock on {{org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady}}},
e.g.
> {noformat}
> "brooklyn-execmanager-vzwdtuv4-1859" #57280 daemon prio=5 os_prio=31 tid=0x00007fa0baef0000
nid=0xf307 waiting for monitor entry [0x0000700009780000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.brooklyn.camp.brooklyn.spi.dsl.BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:93)
>         - waiting to lock <0x0000000784bc2828> (a org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady)
>         at org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322)
>         at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:342)
>         at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:493)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The one thread holding that lock is doing:
> {noformat}
> "brooklyn-execmanager-vzwdtuv4-1864" #57290 daemon prio=5 os_prio=31 tid=0x00007fa0bbc19800
nid=0x76e7 waiting on condition [0x00007000061e1000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007851a4cb8> (a java.util.concurrent.FutureTask)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>         at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
>         at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:342)
>         at org.apache.brooklyn.camp.brooklyn.spi.dsl.BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:105)
>         - locked <0x0000000784bc2828> (a org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady)
>         at org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322)
>         at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:342)
>         at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:493)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking at the caller of {{org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322)}},
it's interesting to see that there are only two instances of that. This tells us that the
other calls (to {{ValueResolver.getMaybeInternal()}}) must all have had a short timeout. Inside
getMaybeInternal(), it waits for the given timeout for the resolved value, and then calls
{{task.cancel(true)}} before returning.
> Given that the tasks' threads are waiting for a {{synchronized}} lock, they cannot be
interrupted. One part of the fix is to change the implementation of {{BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:93)}}
to use a java.util.concurrent.lock that can be interrupted. However, it still feels unsafe
(there could be other code that uses Java's {{synchronized}}).
> Looking at where this {{ValueResolver.timeout(Duration)}} is set could tell us where
these 977ish calls came from. One place is the REST api in {{RestValueResolver.getImmediateValue}}.
If the web-console were polling for the entity's config, that could explain it. Another place
is in the {{org.apache.brooklyn.enricher.stock.Transformer}} enricher.
> This was encountered with 0.9.0-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message