hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Tian <tian...@gmail.com>
Subject Re: IPC Queue Size
Date Fri, 08 Aug 2014 08:56:12 GMT
good point.  that is a big suspect.

I check your log, ClosedChannelException should be triggered by
 call.sendResponseIfReady()(it is the only request in the queue, so handler
send response directly), but at that point the callqueueSize has been
decremented.

2014-08-05 00:50:06,727 WARN  [RpcServer.handler=57,port=60020]
ipc.RpcServer (RpcServer.java:processResponse(1041)) -
RpcServer.respondercallId: 118504 service: ClientService methodName: Multi
size: 141.9 K connection: 10.248.134.67:55347: output error
2014-08-05 00:50:06,727 WARN  [RpcServer.handler=57,port=60020]
ipc.RpcServer (CallRunner.java:run(135)) - RpcServer.handler=57,port=60020:
caught a ClosedChannelException, this means that the server was processing
a request but the client went away. The error message was: null

it looks you have got the fix, would you file a jira?
thanks.


On Fri, Aug 8, 2014 at 2:41 PM, Walter King <walter@adroll.com> wrote:

> I've only looked at the code a little, and likely missed something, but
> does this if block decrement the call queue, if the client already closed
> the connection?
>
>
> https://github.com/apache/hbase/blob/07a771866f18e8ec532c14f624fa908815bd88c7/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java#L74
>
>
>
> On Thu, Aug 7, 2014 at 11:32 PM, Walter King <walter@adroll.com> wrote:
>
> > Yes, sorry, CallQueueTooBigException. but that value never returns to
> > zero, even when number of requests goes to zero.  The call queue too big
> > happens if any regionserver is up for a long enough period of time, so I
> > have to periodically restart them.  Also at that 15:30 time I wasn't
> > seeing that exception, but it seems like that is one time in which a call
> > didnt properly decrement the callqueuesize because it was at zero before
> > and has never hit zero again - today the minimum is even higher.
> >
> >
> > On Thu, Aug 7, 2014 at 9:14 PM, Qiang Tian <tianq01@gmail.com> wrote:
> >
> >> bq. "Eventually we ran into ipc queue size full messages being returned
> to
> >> clients trying large batch puts, as it approaches a gigabyte."
> >>
> >> Do you mean CallQueueTooBigException? it looks not the queue size, but
> the
> >> data size that client sends..configured by
> >> "hbase.ipc.server.max.callqueue.size".
> >>
> >> I guess when you client got the exception, it closed the exception and
> >> causing other shared connection RPC failed.
> >>
> >>
> >> 2014-08-06 22:27:57,253 WARN  [RpcServer.reader=9,port=60020]
> >> ipc.RpcServer
> >> (RpcServer.java:doRead(794)) - RpcServer.listener,port=60020: count of
> >> bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:744)
> >> 2014-08-06 22:27:57,257 WARN  [RpcServer.handler=18,port=60020]
> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) -
> >> RpcServer.respondercallId: 84968 service: ClientService methodName:
> Multi
> >> size: 17.7 K connection: 10.248.130.152:49780: output error
> >> 2014-08-06 22:27:57,258 WARN  [RpcServer.handler=18,port=60020]
> >> ipc.RpcServer (CallRunner.java:run(135)) -
> >> RpcServer.handler=18,port=60020:
> >> caught a ClosedChannelException, this means that the server was
> processing
> >> a request but the client went away. The error message was: null
> >> 2014-08-06 22:27:57,260 WARN  [RpcServer.handler=61,port=60020]
> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) -
> >> RpcServer.respondercallId: 83907 service: ClientService methodName:
> Multi
> >> size: 17.1 K connection: 10.248.1.56:53615: output error
> >> 2014-08-06 22:27:57,263 WARN  [RpcServer.handler=61,port=60020]
> >> ipc.RpcServer (CallRunner.java:run(135)) -
> >> RpcServer.handler=61,port=60020:
> >> caught a ClosedChannelException, this means that the server was
> processing
> >> a request but the client went away. The error message was: null
> >>
> >>
> >>
> >> On Fri, Aug 8, 2014 at 2:57 AM, Walter King <walter@adroll.com> wrote:
> >>
> >> >
> >>
> https://gist.github.com/walterking/4c5c6f5e5e4a4946a656#file-gistfile1-txt
> >> >
> >> >
> http://adroll-test-sandbox.s3.amazonaws.com/regionserver.stdout.log.gz
> >> >
> >> > These are logs from that particular server, and the debug dump from
> >> now(no
> >> > restart in between).  The times in the graph are pacific, so it should
> >> be
> >> > around 2014-08-06 22:25:00.  I do see some exceptions around there.
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message