hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Tian <tian...@gmail.com>
Subject Re: IPC Queue Size
Date Fri, 08 Aug 2014 09:06:26 GMT
ps there are so many "Connection reset by peer", why your client reset the
connection? :-)


On Fri, Aug 8, 2014 at 4:56 PM, Qiang Tian <tianq01@gmail.com> wrote:

> good point.  that is a big suspect.
>
> I check your log, ClosedChannelException should be triggered by
>  call.sendResponseIfReady()(it is the only request in the queue, so handler
> send response directly), but at that point the callqueueSize has been
> decremented.
>
> 2014-08-05 00:50:06,727 WARN  [RpcServer.handler=57,port=60020]
> ipc.RpcServer (RpcServer.java:processResponse(1041)) -
> RpcServer.respondercallId: 118504 service: ClientService methodName: Multi
> size: 141.9 K connection: 10.248.134.67:55347: output error
> 2014-08-05 00:50:06,727 WARN  [RpcServer.handler=57,port=60020]
> ipc.RpcServer (CallRunner.java:run(135)) - RpcServer.handler=57,port=60020:
> caught a ClosedChannelException, this means that the server was processing
> a request but the client went away. The error message was: null
>
> it looks you have got the fix, would you file a jira?
> thanks.
>
>
> On Fri, Aug 8, 2014 at 2:41 PM, Walter King <walter@adroll.com> wrote:
>
>> I've only looked at the code a little, and likely missed something, but
>> does this if block decrement the call queue, if the client already closed
>> the connection?
>>
>>
>> https://github.com/apache/hbase/blob/07a771866f18e8ec532c14f624fa908815bd88c7/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java#L74
>>
>>
>>
>> On Thu, Aug 7, 2014 at 11:32 PM, Walter King <walter@adroll.com> wrote:
>>
>> > Yes, sorry, CallQueueTooBigException. but that value never returns to
>> > zero, even when number of requests goes to zero.  The call queue too big
>> > happens if any regionserver is up for a long enough period of time, so I
>> > have to periodically restart them.  Also at that 15:30 time I wasn't
>> > seeing that exception, but it seems like that is one time in which a
>> call
>> > didnt properly decrement the callqueuesize because it was at zero before
>> > and has never hit zero again - today the minimum is even higher.
>> >
>> >
>> > On Thu, Aug 7, 2014 at 9:14 PM, Qiang Tian <tianq01@gmail.com> wrote:
>> >
>> >> bq. "Eventually we ran into ipc queue size full messages being
>> returned to
>> >> clients trying large batch puts, as it approaches a gigabyte."
>> >>
>> >> Do you mean CallQueueTooBigException? it looks not the queue size, but
>> the
>> >> data size that client sends..configured by
>> >> "hbase.ipc.server.max.callqueue.size".
>> >>
>> >> I guess when you client got the exception, it closed the exception and
>> >> causing other shared connection RPC failed.
>> >>
>> >>
>> >> 2014-08-06 22:27:57,253 WARN  [RpcServer.reader=9,port=60020]
>> >> ipc.RpcServer
>> >> (RpcServer.java:doRead(794)) - RpcServer.listener,port=60020: count of
>> >> bytes read: 0
>> >> java.io.IOException: Connection reset by peer
>> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> >> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415)
>> >> at
>> >>
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556)
>> >> at
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> at
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> at java.lang.Thread.run(Thread.java:744)
>> >> 2014-08-06 22:27:57,257 WARN  [RpcServer.handler=18,port=60020]
>> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) -
>> >> RpcServer.respondercallId: 84968 service: ClientService methodName:
>> Multi
>> >> size: 17.7 K connection: 10.248.130.152:49780: output error
>> >> 2014-08-06 22:27:57,258 WARN  [RpcServer.handler=18,port=60020]
>> >> ipc.RpcServer (CallRunner.java:run(135)) -
>> >> RpcServer.handler=18,port=60020:
>> >> caught a ClosedChannelException, this means that the server was
>> processing
>> >> a request but the client went away. The error message was: null
>> >> 2014-08-06 22:27:57,260 WARN  [RpcServer.handler=61,port=60020]
>> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) -
>> >> RpcServer.respondercallId: 83907 service: ClientService methodName:
>> Multi
>> >> size: 17.1 K connection: 10.248.1.56:53615: output error
>> >> 2014-08-06 22:27:57,263 WARN  [RpcServer.handler=61,port=60020]
>> >> ipc.RpcServer (CallRunner.java:run(135)) -
>> >> RpcServer.handler=61,port=60020:
>> >> caught a ClosedChannelException, this means that the server was
>> processing
>> >> a request but the client went away. The error message was: null
>> >>
>> >>
>> >>
>> >> On Fri, Aug 8, 2014 at 2:57 AM, Walter King <walter@adroll.com> wrote:
>> >>
>> >> >
>> >>
>> https://gist.github.com/walterking/4c5c6f5e5e4a4946a656#file-gistfile1-txt
>> >> >
>> >> >
>> http://adroll-test-sandbox.s3.amazonaws.com/regionserver.stdout.log.gz
>> >> >
>> >> > These are logs from that particular server, and the debug dump from
>> >> now(no
>> >> > restart in between).  The times in the graph are pacific, so it
>> should
>> >> be
>> >> > around 2014-08-06 22:25:00.  I do see some exceptions around there.
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message