mina-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Kuker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SSHD-721) deadlock: all nio workers wait to be woken up
Date Sat, 14 Apr 2018 14:11:00 GMT

    [ https://issues.apache.org/jira/browse/SSHD-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438353#comment-16438353
] 

Bill Kuker commented on SSHD-721:
---------------------------------

I am zeroing in on reproducing it...

(In the following steps "web browser" and "web server" are just tiny java functions playing
those roles)
 # Create an SSH Server, SSH Client and a port forward listening on the SSH Server.
 # Configure the SSH Server with *N* nio worker threads.
 # Start the "Web Server" that accepts a connection, reads some data from the client, writes
some data to the client, closes the connection.
 # Launch *W* "Web Browsers" that each open a connection to the Web Server via the port forward
listening on the SSH Server.
 # Use a cyclic barrier or similar to make all *W* Web Browsers connect and send their request
simultaneously. 

(/) When *W < N* everything works fine.

(x) When *W >= N* then the SSHD Server completely stops working. It will never recover
(Unless some outside force kills some of the processes or tcp connections).

My guess is that the SSH Server can handle N-1 port forwards connecting at once, and the 1
extra worker thread is used for communication to the SSH Client. Once all worker threads are
handling new connections to a port forward, no thread is available to make progress on communication
to the sshd client.

I will be providing a Junit test case once I have cleaned up my code.

Replacing the nio2 thread pool, which is created by ThreadUtils::newFixedThreadPool() with
a pool created by ThreadUtils::newCachedThreadPool() seems to fix the problem, although I
am not suggesting that as the final fix.

If my guess is correct, my suggestion for the fix would be different worker pools for SSH
Server / SSH Client communication and for handling the port forwards. This would allow SSHD
to operate when the number of simultaneous port forwarded connections being established is
larger than the number of threads in either pool, and also allow users to tune these two thread
pools separately depending on their usage patterns. 

 

> deadlock: all nio workers wait to be woken up
> ---------------------------------------------
>
>                 Key: SSHD-721
>                 URL: https://issues.apache.org/jira/browse/SSHD-721
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Markus Rathgeb
>            Priority: Major
>
> I am using sshd-core for a server machine (S) that accepts incoming connections and port
forwarding requests.
> There are client machines (C) that run servers that should be accessible by a tunnel
to the server.
> On the client machines (C) also an implementation using sshd-core is running that establish
the connection to the server (S) and initiate the port forwarding.
> Other clients are using the tunnelled connection to communicate with the servers that
are running on the client machines (C).
> Sometimes I realized that no data is transferred anymore (through the tunnels).
> All the worker reside in the waitFor function and no one wakes them up.
> {noformat}
> "sshd-SshServer[67de991c]-nio2-thread-3" - Thread t@125
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <132c6b60> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <6d02d7ed> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-2" - Thread t@124
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <7e9f4eff> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <35fbf3e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-1" - Thread t@122
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <49ce93a9> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker.invokeDirect(Invoker.java:157)
> 	at sun.nio.ch.UnixAsynchronousSocketChannelImpl.implRead(UnixAsynchronousSocketChannelImpl.java:553)
> 	at sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:276)
> 	at sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:297)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:304)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:249)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:243)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:239)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:235)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:231)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:227)
> 	at org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:178)
> 	at org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:156)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <7d1c59e6> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-timer-thread-1" - Thread t@105
>    java.lang.Thread.State: TIMED_WAITING
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for <655c080c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> - None
> {noformat}
> To eliminate some other code that could trigger that error I created a "minimal" example
-- a simple test application -- that could be used to demonstrate the hang (for me it is reproducible
using that code).
> Please have a look at https://github.com/maggu2810/sshd-deadlock/tree/first-report where
you could also find a readme with a short description about the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message