hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "聪聪" <175998...@qq.com>
Subject 回复: 回复: Hbase cluster is suddenly unable to respond
Date Fri, 30 Oct 2015 02:24:27 GMT
The client code is http://paste2.org/p3BXkKtV


Is the client version compatible with it?
I see  that the client version is hbase0.96.1.1-hadoop2




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yuzhihong@gmail.com>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org"<user@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=

Do you know how wide the rows returned may get ?

Cheers

On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <175998806@qq.com> wrote:

> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
>     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
>     at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yuzhihong@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<user@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
>   required RegionSpecifier region = 1;
>   required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <175998806@qq.com> wrote:
>
> > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<175998806@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<user@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<175998806@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<user@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yuzhihong@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<user@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <175998806@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver  then hbase back to normal. Is it possible  bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost  this warn log
> > > mater on the log  is as follows:
> > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yuzhihong@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<user@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <175998806@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> >
>
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message