Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 86252 invoked from network); 16 Sep 2008 17:56:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Sep 2008 17:56:12 -0000 Received: (qmail 84067 invoked by uid 500); 16 Sep 2008 17:56:08 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 84044 invoked by uid 500); 16 Sep 2008 17:56:08 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 84033 invoked by uid 99); 16 Sep 2008 17:56:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Sep 2008 10:56:08 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.22.94.67] (HELO virtual.halosg.com) (72.22.94.67) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Sep 2008 17:55:09 +0000 Received: (qmail 3694 invoked from network); 16 Sep 2008 12:55:40 -0500 Received: from 72-19-171-38.static.mesanetworks.net (HELO ?192.168.3.120?) (72.19.171.38) by halosg.com with SMTP; 16 Sep 2008 12:55:40 -0500 Message-ID: <48CFF31C.2040801@hanik.com> Date: Tue, 16 Sep 2008 11:55:40 -0600 From: Filip Hanik - Dev Lists User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: dev@activemq.apache.org Subject: Re: tcp and nio transport considerations References: <48CFCD10.4070902@tid.es> In-Reply-To: <48CFCD10.4070902@tid.es> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org hi Manuel, I may not be understanding your theory completely, but if I do, I'd have to disagree with parts of your assessment, the problem you describe doesn't really have anything to do with blocking vs non blocking IO. instead its the implementation on top of the socket API. taking a simple java program, you can read and write from blocking sockets simultaneously. so using NIO isn't really the answer, if there is blocking inside AMQ based on a socket, then that has nothing to do with blocking or non blocking IO. The lock that is used when you call flush() shouldn't hinder the read, and if it does, that's a bug in AMQ, rather than needing to switch to NIO. (nothing wrong with NIO of course, just pointing out that the dead lock you run into might be a bug in AMQ rather than a dependency on the socket API) The example below shows how simultaneous read and write is possible. We are writing down 1MB byte array at a time, this should fill up the send buffer and I still can read data from the client when that happens. public static void main(String[] args) throws Exception { SServerSocket ss = new ServerSocket(9999); System.out.println("Listening on port 9999"); final Socket s = ss.accept(); Thread reader = new Thread() { public void run() { try { int i = 0; while (i != -1) { i = s.getInputStream().read(); System.out.print( ( (char) i)); } }catch(Exception x) { System.err.println("READER:"+x.getMessage()); } } }; Thread writer = new Thread() { public void run() { try { int i = 65; while (true) { byte[] b = new byte[1024*1000]; Arrays.fill(b,(byte)i++); System.out.println("Writing:"+((char)b[0])); s.getOutputStream().write(b); Thread.sleep(1000); } }catch(Exception x) { System.err.println("READER:"+x.getMessage()); } } }; reader.start(); writer.start(); } Manuel Teira Paz wrote: > Hello, I would like to share some thoughts and adventures about tcp > and nio transports to your consideration, hopefully waiting for some > feedback. > > We are using a 4.1 activemq compiled from the 4.1 svn branch. For some > time we didn't run into any important problem, but lately, we were > suffering some issue regarding tcp transport. > > The problem arises when the tcp buffer gets full during a > TcpBufferedOutputStream.flush(). When this happens, and probably when > all the consumers/producers are sharing the same connection, we run > into a deadlock situation, since the socket OutputStream writes in > locking mode. Meanwhile, no reader that could extract some data from > the socket to ease the situation is allowed to do its work, since it > shares the same connection locked in the write attempt. Do you agree > with this analysis and the chance that it could happen? > > As a solution, nio and its non-blocking socket management, selectors > and friends, seemed the way to go. Unfortunately, the nio transport is > not available in the 4.1 branch, but it was easily backported from the > trunk. Trying to use it, some issues arised: > > - Connection attempts were temporized, and the whole system worked > randomly and unresponsible. There were no deadlocks, but one symptom > was that transport.nio.SelectorSelection spent a lot of time waiting > for the socketChannel.register call to complete, in the > SelectorSelection constructor. > I don't know the exact reason, but it seems that SelectorWorker.run() > monopolizes the access to the selector doing: > > while (isRunning()) { > int count = selector.select(10); > if (count == 0) { > continue; > } > > I didn't have the chance to check if this thread has greater priority > than the one running the SelectorSelection constructor. Anyway, as a > workaround I changed the previous code with: > > int count = selector.select(10); > if (count == 0) { > + Thread.yield(); > continue; > } > > and mostly everything started to work as expected. I was able to > connect consistently to the broker, using a nio:// transport. > > - The remaining problem I found is that a java test client (connect, > sends a message, and closes the connection) didn't close itself > correctly, and it did so using the tcp:// transport. I found two > possible sources for this problem: > > a). NIOTransport doesn't close the selection on doStop. I think this > is needed to allow the SelectorWorker thread to finalize. > b). Even after doing that, and since the > SelectorManager.selectorExecutor is the result of calling > Executors.newCachedThreadPool, the idle threads are not destroyed > inmediatly, but after 60 seconds. Since these threads are created as > non-daemon threads, the VM waits for them to finish. As a workaround, > I changed the instantiation of SelectorManager.selectorExecutor to: > > private Executor selectorExecutor = > Executors.newCachedThreadPool(new ThreadFactory() { > public Thread newThread(Runnable r) { > Thread rc = new Thread(r); > rc.setName("NIO Transport Thread"); > + rc.setDaemon(true); > return rc; > } > }); > > Hence, avoiding them to be created as non-daemon threads. However, I > suppose this could be dangerous, and something could remain > inconsistent. Another solution could be not to use a cachedThreadPool, > but this could hit the performance. What would be the best way to > avoid the client shutdown delay? > > Currently, changing to 5.1 or 5.2 is not an option for us, since we > run into problems in our previous attempts to switch. We need to > remain (at least while we don't have time enough to run a complete > validation of 5.1 or the upcoming 5.2) with 4.1 and the needed patches > to make it work properly. > > Also, if you want 4.1 to feature NIO support, I could open a JIRA > issue attaching the patch. Anyway, any idea, comment or proposal about > the problems we run into and the exposed solutions will be very welcome. > > Best regards. > > > Manuel. > >