flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Jian (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLUME-2786) It will enter a deadlock state when modify the conf file before I stop flume-ng
Date Wed, 02 Nov 2016 09:50:58 GMT

     [ https://issues.apache.org/jira/browse/FLUME-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Yan Jian updated FLUME-2786:
    Attachment: flume-2786-v1.6.0.patch

This bug also occured in our production environment.
It can lead a nested monitor lockout between thread _agent-shutdown-hook_ and _conf-file-poller_,
details as below:
# _agent-shutdown-hook_ acquired {{application}} lock and tried to stop the {{executeService}}
??a {{ThreadPoolExecutor}} instance??.
# _conf-file-poller_ is scheduled to running in the {{executeService}}'s pool, preventing
the {{executeService}} from being stopped.
# _conf-file-poller_ waits for {{application}} lock which was held by _agent-shutdown-hook_.

In our solution, {{synchronized}} is upgraded to {{ReentrantLock}}, and _conf-file-poller_
watches {{beingStopped}} condition with a 500ms interval when trying to acquire {{application}}
Our solution based on 1.6.0 is shared as +flume-2786-v1.6.0.patch+.

>  It will enter a deadlock state when modify the conf file before I stop flume-ng
> --------------------------------------------------------------------------------
>                 Key: FLUME-2786
>                 URL: https://issues.apache.org/jira/browse/FLUME-2786
>             Project: Flume
>          Issue Type: Bug
>          Components: Master
>    Affects Versions: v1.6.0
>            Reporter: godfrey he
>            Priority: Blocker
>         Attachments: flume-2786-v1.6.0.patch
> When modify the conf fileļ¼Œand then I stop the flume-ng,  It will enter a deadlock state.

> jstack result:
> "agent-shutdown-hook" prio=10 tid=0x00007f2e26419800 nid=0x333ae waiting on condition
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000eaff3df8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
>         at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:635)
>         at org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:87)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:106)
>         - locked <0x00000000eaf2daa0> (a org.apache.flume.lifecycle.LifecycleSupervisor)
>         at org.apache.flume.node.Application.stop(Application.java:93)
>         - locked <0x00000000eaf3c580> (a org.apache.flume.node.Application)
>         at org.apache.flume.node.Application$1.run(Application.java:348)
> "conf-file-poller-0" prio=10 tid=0x00007f2e2e8cd000 nid=0x21819 waiting for monitor entry
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.flume.node.Application.handleConfigurationEvent(Application.java:88)
>         - waiting to lock <0x00000000eaf3c580> (a org.apache.flume.node.Application)

This message was sent by Atlassian JIRA

View raw message