avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: Netty/Avro IPC problem: channel closed
Date Wed, 14 Sep 2011 19:06:35 GMT
yeah, I found i was actually using 1.5.1...
updated to 1.5.2 , now it works fine so far after 1 hour

Thanks a lot!
Yang

On Wed, Sep 14, 2011 at 10:56 AM, James Baldassari
<jbaldassari@gmail.com> wrote:
> It appears to be pre-1.5.2 from this part of the stack trace:
>
>        at java.util.concurrent.Semaphore.acquire(Semaphore.java:313)
>        at
> org.apache.avro.ipc.NettyTransceiver$CallFuture.get(NettyTransceiver.java:203)
>
> CallFuture was moved out of NettyTransceiver as part of AVRO-539 and is now
> a stand-alone class.  Also the Semaphore inside CallFuture was replaced with
> a CountDownLatch, so in 1.5.2 and later we should never see CallFuture
> waiting on a Semaphore.
>
> From your initial description it appears that some temporary network
> disruption might have caused the connection between the client and server to
> close, and then the client never recovered from this situation.  This
> doesn't surprise me because I don't think the pre-1.5.2 NettyTransceiver had
> any way to recover from a connection failure.  While working on AVRO-539 I
> modified the transceiver code such that it would attempt to re-establish the
> connection if the connection was lost, so that's why I think this may help
> you.  Just a guess though.  But like I said, since the code has changed so
> much in 1.5.2 and later, it will be much easier to figure out what's wrong
> (and fix it if necessary) if you can reproduce it using 1.5.2 or later.
>
> -James
>
>
> On Wed, Sep 14, 2011 at 1:39 PM, Yang <teddyyyy123@gmail.com> wrote:
>>
>> thanks James:
>>
>> I *think* I'm using 1.5.2, but could check to be sure.
>> how do you determine that it is a pre-1.5.2 version?
>>
>> Yang
>>
>> On Wed, Sep 14, 2011 at 10:25 AM, James Baldassari
>> <jbaldassari@gmail.com> wrote:
>> > Hi Yang,
>> >
>> > From the stack trace you posted it appears that you are using a version
>> > of
>> > Avro prior to 1.5.2.  Which version are you using?  There have been a
>> > number
>> > of significant changes recently to the RPC framework and the Netty
>> > implementation in particular.  Could you please try to reproduce the
>> > problem
>> > using Avro 1.5.2 or newer?  The problem may be resolved with an
>> > upgrade.  If
>> > the problem still exists in the newer versions, it will be a lot easier
>> > to
>> > diagnose/fix it if we can see stack traces from a post-1.5.2 version.
>> >
>> > Thanks,
>> > James
>> >
>> >
>> > On Wed, Sep 14, 2011 at 1:08 PM, Yang <teddyyyy123@gmail.com> wrote:
>> >>
>> >> I'm always seeing these "channel closed " exceptions , with low
>> >> probability, i.e. about every 10 hours under heavy load.
>> >>
>> >> I'm not sure if it's the server that got the channel closed or the
>> >> client, so I included the exception stack from both sides.
>> >> anybody has an idea how to debug this?
>> >>
>> >> also, let's say it does have a valid reason for closing this, what is
>> >> my strategy of coping with this? I originally have many
>> >> senders, due to the channel close exception, many of them died, after
>> >> this, only 2 application threads remain, but they
>> >> all seem blocked on trying to grab a connection from Netty's pool, so
>> >> even if I create new sender threads, it seems they would still
>> >> block. so how can I tell netty to "reset/replenish " its connections?
>> >>
>> >>
>> >> Thanks a lot
>> >> Yang
>> >>
>> >>
>> >> client side:
>> >>
>> >>
>> >>
>> >>  WARN 16:51:02,079 Unexpected exception from downstream.
>> >> java.nio.channels.ClosedChannelException
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:636)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:369)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:632)
>> >>        at
>> >>
>> >> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:611)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:578)
>> >>        at
>> >> org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:259)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.NettyTransceiver.transceive(NettyTransceiver.java:131)
>> >>        at org.apache.avro.ipc.Requestor.request(Requestor.java:134)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:55)
>> >>        at $Proxy0.collect_ad_click(Unknown Source)
>> >>
>> >>
>> >>
>> >> server side:
>> >>
>> >>
>> >>  WARN 16:51:01,939 Unexpected exception from downstream.
>> >> java.io.IOException: Broken pipe
>> >>        at sun.nio.ch.FileDispatcher.write0(Native Method)
>> >>        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> >>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
>> >>        at sun.nio.ch.IOUtil.write(IOUtil.java:78)
>> >>        at
>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.SocketSendBufferPool$PooledSendBuffer.transferTo(SocketSendBufferPool.java:239)
>> >>        at
>> >> org.jboss.netty.channel.socket.nio.NioWorker.write0(NioWorker.java:469)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:387)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:137)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:632)
>> >>        at
>> >>
>> >> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:611)
>> >>        at org.jboss.netty.channel.Channels.write(Channels.java:578)
>> >>        at
>> >> org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:259)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:137)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:120)
>> >>        at
>> >> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
>> >>        at
>> >>
>> >> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
>> >>        at
>> >>
>> >> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
>> >>        at
>> >>
>> >> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
>> >>        at
>> >> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>> >>        at
>> >> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>> >>        at
>> >> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
>> >>        at
>> >>
>> >> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
>> >>        at
>> >> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
>> >>        at
>> >>
>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>        at
>> >>
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>        at java.lang.Thread.run(Thread.java:679)
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> after exception, the client sender's state, blocked :
>> >>
>> >>
>> >> "EventQueue-Sender5" prio=10 tid=0x00007f5a0c2f5000 nid=0x6e3b waiting
>> >> on condition [0x00007f5a19519000]
>> >>   java.lang.Thread.State: WAITING (parking)
>> >>        at sun.misc.Unsafe.park(Native Method)
>> >>        - parking to wait for  <0x00000007510e00c0> (a
>> >> java.util.concurrent.Semaphore$NonfairSync)
>> >>        at
>> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >>        at
>> >>
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>> >>        at
>> >>
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
>> >>        at
>> >>
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>> >>        at java.util.concurrent.Semaphore.acquire(Semaphore.java:313)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.NettyTransceiver$CallFuture.get(NettyTransceiver.java:203)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.NettyTransceiver.transceive(NettyTransceiver.java:133)
>> >>        at org.apache.avro.ipc.Requestor.request(Requestor.java:134)
>> >>        - locked <0x0000000757144220> (a
>> >> org.apache.avro.ipc.specific.SpecificRequestor)
>> >>        at
>> >>
>> >> org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:55)
>> >>        at $Proxy0.collect_ad_click(Unknown Source)
>> >>        at
>> >>
>> >> com.cgm.whisky.emitter.ConnectionPool$EventsCollectorWithSerial.collect_ad_click(ConnectionPool.java:60)
>> >
>> >
>
>

Mime
View raw message