nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Payne (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NIFI-1333) FlowController fails to shut down gracefully even though there is nothing going on in the flow
Date Thu, 31 Dec 2015 21:36:39 GMT

    [ https://issues.apache.org/jira/browse/NIFI-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076143#comment-15076143
] 

Mark Payne edited comment on NIFI-1333 at 12/31/15 9:36 PM:
------------------------------------------------------------

I'm not finding any reason that we would really need a write lock there. We would need a read
lock, though, to obtain the root group (as 'rootGroup' is protected by the read/write lock).
But I cannot recall any reason in particular that it would need a write lock to shutdown.
This method was written quite a long time ago (i'd guess about 3-4 years ago), so it's quite
possible that some refactoring happened that caused the write lock to no longer be necessary.


was (Author: markap14):
I'm not finding any reason that we would really need a write lock there. We would need a read
lock, though, to obtain the root group (as 'rootGroup' is protected by the read/write lock).
But I cannot recall any reason in particular that it would need a write lock to shutdown.

> FlowController fails to shut down gracefully even though there is nothing going on in
the flow
> ----------------------------------------------------------------------------------------------
>
>                 Key: NIFI-1333
>                 URL: https://issues.apache.org/jira/browse/NIFI-1333
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.4.1
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>            Priority: Trivial
>             Fix For: 0.5.0
>
>
> Basically the following test fails: https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50
even though there is no compelling reason for it to fail based on what's in the flow.
> Also, the message in logs is confusing . . .
> {code}
> Initiated graceful shutdown of flow controller...waiting up to 10 seconds
> 2015-12-23 15:19:11,977 WARN [main] o.apache.nifi.controller.FlowController Controller
hasn't terminated properly.  There exists an uninterruptable thread that will take an indeterminate
amount of time to stop.  Might need to kill the program manually.
> {code}
> What actually happens is deadlock during the shutdown.
> Below are the relevant jstack:
> {code}
> java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20988> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
> 	at org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1124)
> 	at org.apache.nifi.test.s2s.SiteToSiteTests.bar(SiteToSiteTests.java:75)
> . . .
> "Framework Task Thread Thread-1" prio=5 tid=0x00007fc8a2064800 nid=0x6a03 waiting on
condition [0x0000700001ded000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20288> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.nifi.controller.FlowController.getRootGroupId(FlowController.java:1262)
> 	at org.apache.nifi.controller.tasks.ExpireFlowFiles.run(ExpireFlowFiles.java:54)
> . . .
> "Timer-Driven Process Thread-1" prio=5 tid=0x00007fc8a3146800 nid=0x6c03 waiting on condition
[0x0000700001ef0000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20288> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.nifi.controller.FlowController.isClustered(FlowController.java:2984)
> 	at org.apache.nifi.controller.FlowController.heartbeat(FlowController.java:3444)
> {code}
> The issue the way I see it is that FlowController's _shutdown_ routine is synchronized
under the same lock as most of the FlowController callbacks made by other threads, hence those
threads can't be shutdown since they are in dead-lock.
> I don't think there is any reason to synchronize the the shutdown routine since all we
are trying to do is shut down the very same threads that are blocking. Removing synchronization
resolves the issue.
> Will submit a patch in a few



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message