hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Tian <tian...@gmail.com>
Subject Re: 答复: regionserver 60020 timeout
Date Thu, 12 Jun 2014 08:30:31 GMT
another occurrence of this problem:
http://mail-archives.apache.org/mod_mbox/hbase-dev/201402.mbox/%3CCAHkeaHXAaJFyC6XNV+cBwupqAKoTmkJ349CPQuk02FGS8qcF1w@mail.gmail.com%3E


According to the java doc, the read call should returned immediately after
reading the available data in socket buffer. there is only 1 case that it
can be blocked:
If another thread has already initiated a read operation upon this channel,
however, then an invocation of this method will block until the first
operation is complete.

from the regionserver jstack output, only 1 reader is in read, so not the
case.


there is a slight code path difference for the 2 stacks:

firstly:
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)


and later:
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)

i.e. firslty in channelIO call, then read directly. there is a while loop
in channelIO. perhaps it could take time when there is much data to read?

questions for WeiWei:
1)what is the hbase version?
2)How the problem happened? what operation you did?

thanks.


On Sat, Jun 7, 2014 at 9:47 PM, Qiang Tian <tianq01@gmail.com> wrote:

> it looks not a busy system behavior:
> 1)From regionserver stacktraces, only reader 4 is in reading, all other
> readers are in waiting..
> 2)Weiwei mentioned "ipc.RpcServer: RpcServer.listener,port=60020: count
> of bytes read: 0 java.nio.channels.ClosedByInterruptException",  --the
> reader read nothing before it is interrupted
>
>
> we set the socket to be non-blocking, and the code looks fine to do the
> read:
>               if (key.isValid()) {
>                 if (key.isReadable()) {
>                   doRead(key);
>                 }
>               }
>
> googled the issue,  only find cases that non-blocking socket blocks in
> read at client side(due to finishConnect not called).
>
> for our case, maybe there is something wrong with the corresponding client
> connection?
> need to dig more.
>
>
>
>
> On Sat, Jun 7, 2014 at 7:38 AM, sunweiwei <sunww@asiainfo-linkage.com>
> wrote:
>
>> Hi
>>
>> the regionserver jstack log is   http://paste2.org/2wJc5CJx
>> the master jstack log is   http://paste2.org/t4KLpbXw.   Maybe the
>> regionserver is hbase:namespace server.
>> the master dump log (from the 60010 web) is http://paste2.org/vB2UVPyO
>>
>>
>> I saw the regionserver's Requests Per Second is 0 at that time ,
>> other regionserver's Requests Per Second is about 10000~50000,  and some
>> clients failed.
>> I'm not sure about loading at that time.  regionserver's loading usually
>> is not high.
>>
>> I restart the regionserver and master then hbase back to normal.
>>
>> Thanks
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2014年6月7日 0:16
>> 收件人: Hbase-User
>> 主题: Re: regionserver 60020 timeout
>>
>> On Thu, Jun 5, 2014 at 11:35 PM, sunweiwei <sunww@asiainfo-linkage.com>
>> wrote:
>>
>> > Hi
>> >
>> > I find some 60000 millis timeout in master log, when master connect
>> > regionserver's 60020 port. Log like this:
>> >
>> >
>> Could we see the full stack trace?  It would help confirm Qiang's
>> prognosis.  Do you have monitoring of this server setup?  What did it show
>> for loading, handler-count, at this time?
>>
>> Thanks,
>> St.Ack
>>
>>
>>
>> > because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting
>> > for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/132.228.248.61:34835
>> > remote=dsjhd074/132.228.248.84:60020]
>> >
>> > at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>> >
>> >          at
>> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650
>> > )
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callB
>> > lockingMethod(RpcClient.java:1708)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$Blocking
>> > Stub.openRegion(AdminProtos.java:20595)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.ja
>> > va:628)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.ja
>> > va:1901)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.master.AssignmentManager.access$300(AssignmentManage
>> > r.java:106)
>> >
>> >          at
>> >
>> >
>> org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager
>> > .java:660)
>> >
>> >          at
>> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >
>> >          at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
>> > va:886)
>> >
>> >          at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
>> > 08)
>> >
>> >          at java.lang.Thread.run(Thread.java:662)
>> >
>> >
>> >
>> > And I find listener,port 60020 is waiting to lock <0x000000041ae290d8>
>>  in
>> > regionserver's jstack log. Is that ok? Thanks
>> >
>> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007fade53c0000
>> > nid=0x22d8c runnable [0x00007fadb8b32000]
>> >
>> >    java.lang.Thread.State: RUNNABLE
>> >
>> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
>> >
>> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>> >
>> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
>> >
>> >         at sun.nio.ch.IOUtil.read(IOUtil.java:171)
>> >
>> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>> >
>> >         - locked <0x000000041b1c78c0> (a java.lang.Object)
>> >
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>> >
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> >
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.ja
>> > va:1476)
>> >
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> >
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.ja
>> > va:563)
>> >
>> >         - locked <0x000000041ae290d8> (a
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> >
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538
>> > )
>> >
>> >         at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
>> > va:886)
>> >
>> >         at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
>> > 08)
>> >
>> >         at java.lang.Thread.run(Thread.java:662)
>> >
>> >
>> >
>> > "RpcServer.listener,port=60020" daemon prio=10 tid=0x00007fadd4426800
>> > nid=0x22db3 waiting for monitor entry [0x00007fadb658a000]
>> >
>> >    java.lang.Thread.State: BLOCKED (on object monitor)
>> >
>> >         at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(RpcSer
>> > ver.java:593)
>> >
>> >         - waiting to lock <0x000000041ae290d8> (a
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> >
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:745)
>> >
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:668)
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message