Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1CCC911770 for ; Thu, 12 Jun 2014 13:04:34 +0000 (UTC) Received: (qmail 64417 invoked by uid 500); 12 Jun 2014 13:04:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 64349 invoked by uid 500); 12 Jun 2014 13:04:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 64339 invoked by uid 99); 12 Jun 2014 13:04:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jun 2014 13:04:31 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sunww@asiainfo-linkage.com designates 125.35.16.198 as permitted sender) Received: from [125.35.16.198] (HELO asiainfo-linkage.com) (125.35.16.198) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 12 Jun 2014 13:04:27 +0000 Received: from sunPC([223.65.190.44]) by asiainfo-linkage.com(AIMCPRO 1.0.0.0) with SMTP id jm23539a13c3; Thu, 12 Jun 2014 21:03:58 +0800 From: "sunweiwei" To: References: <005101cf8151$78796730$696c3590$@com> <003601cf81e0$735743f0$5a05cbd0$@com> <000c01cf863a$c2f86e80$48e94b80$@com> In-Reply-To: <000c01cf863a$c2f86e80$48e94b80$@com> Subject: =?UTF-8?Q?=E7=AD=94=E5=A4=8D:_=E7=AD=94=E5=A4=8D:_regionserver_60020_timeo?= =?UTF-8?Q?ut?= Date: Thu, 12 Jun 2014 21:04:01 +0800 Message-ID: <001001cf863e$c81a60e0$584f22a0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac+GGaqCqxu8oxTFRAKzQ/vhrDuDfgAHbABgAAHGUMA= Content-Language: zh-cn x-cr-hashedpuzzle: AE4v AiMU A5wL BfOE CCZR C41X DTxn DeDr DnfF DtRO Du0q D1ea D301 FLd7 F0JK GkQb;1;dQBzAGUAcgBAAGgAYgBhAHMAZQAuAGEAcABhAGMAaABlAC4AbwByAGcA;Sosha1_v1;7;{212E464D-A46D-4EA3-9866-A0FD15AD45C1};cwB1AG4AdwB3AEAAYQBzAGkAYQBpAG4AZgBvAC0AbABpAG4AawBhAGcAZQAuAGMAbwBtAA==;Thu, 12 Jun 2014 13:04:00 GMT;VHsNWToAIABUew1ZOgAgAHIAZQBnAGkAbwBuAHMAZQByAHYAZQByACAANgAwADAAMgAwACAAdABpAG0AZQBvAHUAdAA= x-cr-puzzleid: {212E464D-A46D-4EA3-9866-A0FD15AD45C1} X-AIMC-AUTH: sunww X-AIMC-MAILFROM: sunww@asiainfo-linkage.com X-AIMC-Msg-ID: VJefEv6B X-Virus-Checked: Checked by ClamAV on apache.org I check again. The timeout regionserver is just sometimes the = namespace server, not always. -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: sunweiwei = [mailto:sunww@asiainfo-linkage.com]=20 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=8812=E6=97=A5 = 20:35 =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org =E4=B8=BB=E9=A2=98: =E7=AD=94=E5=A4=8D: =E7=AD=94=E5=A4=8D: regionserver = 60020 timeout Thanks, Qiang Tian. I'm using hbase0.96.0. It's a heavy-write cluster and read request is = rarely . I note When met timeout, the timeout regionserver seems like always = the namespace server.=20 This is in master jstack log. Maybe this connection cause = regionserver's block? "RpcServer.handler=3D57,port=3D60000" daemon prio=3D10 = tid=3D0x00007f12994cf800 nid=3D0x7991 in Object.wait() = [0x00007f12714f2000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1432) - locked <0x000000071ae56260> (a = org.apache.hadoop.hbase.ipc.RpcClient$Call) at = org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 650) at = org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca= llBlockingMethod(RpcClient.java:1708) at = org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$Blo= ckingStub.get(ClientProtos.java:27308) at = org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1281)= at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:752) at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:750) at = org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetry= ingCaller.java:116) - locked <0x000000071d0f6cd8> (a = org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:755) at = org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceMa= nager.java:134) at = org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceMa= nager.java:118) - locked <0x000000071e1dd7c8> (a = org.apache.hadoop.hbase.master.TableNamespaceManager) at = org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.jav= a:3103) at = org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HM= aster.java:3123) at = org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HM= aster.java:3024) at = org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.c= allBlockingMethod(MasterProtos.java:38261) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146) at = org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851) -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: Qiang Tian [mailto:tianq01@gmail.com]=20 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=8812=E6=97=A5 = 16:39 =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org =E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: regionserver 60020 timeout Hi Ted, Stack, not sure if we could add some guide for information that need to be provided when describing a problem...? many times I see Ted asked the = hbase version etc...:-P maybe it could help save some time.. On Thu, Jun 12, 2014 at 4:30 PM, Qiang Tian wrote: > another occurrence of this problem: > > = http://mail-archives.apache.org/mod_mbox/hbase-dev/201402.mbox/%3CCAHkeaH= XAaJFyC6XNV+cBwupqAKoTmkJ349CPQuk02FGS8qcF1w@mail.gmail.com%3E > > > According to the java doc, the read call should returned immediately = after > reading the available data in socket buffer. there is only 1 case that = it > can be blocked: > If another thread has already initiated a read operation upon this > channel, however, then an invocation of this method will block until = the > first operation is complete. > > from the regionserver jstack output, only 1 reader is in read, so not = the > case. > > > there is a slight code path difference for the 2 stacks: > > firstly: > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402) > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > > and later: > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > i.e. firslty in channelIO call, then read directly. there is a while = loop > in channelIO. perhaps it could take time when there is much data to = read? > > questions for WeiWei: > 1)what is the hbase version? > 2)How the problem happened? what operation you did? > > thanks. > > > On Sat, Jun 7, 2014 at 9:47 PM, Qiang Tian wrote: > >> it looks not a busy system behavior: >> 1)From regionserver stacktraces, only reader 4 is in reading, all = other >> readers are in waiting.. >> 2)Weiwei mentioned "ipc.RpcServer: RpcServer.listener,port=3D60020: = count >> of bytes read: 0 java.nio.channels.ClosedByInterruptException", = --the >> reader read nothing before it is interrupted >> >> >> we set the socket to be non-blocking, and the code looks fine to do = the >> read: >> if (key.isValid()) { >> if (key.isReadable()) { >> doRead(key); >> } >> } >> >> googled the issue, only find cases that non-blocking socket blocks = in >> read at client side(due to finishConnect not called). >> >> for our case, maybe there is something wrong with the corresponding >> client connection? >> need to dig more. >> >> >> >> >> On Sat, Jun 7, 2014 at 7:38 AM, sunweiwei = >> wrote: >> >>> Hi >>> >>> the regionserver jstack log is http://paste2.org/2wJc5CJx >>> the master jstack log is http://paste2.org/t4KLpbXw. Maybe the >>> regionserver is hbase:namespace server. >>> the master dump log (from the 60010 web) is = http://paste2.org/vB2UVPyO >>> >>> >>> I saw the regionserver's Requests Per Second is 0 at that time , >>> other regionserver's Requests Per Second is about 10000~50000, and = some >>> clients failed. >>> I'm not sure about loading at that time. regionserver's loading = usually >>> is not high. >>> >>> I restart the regionserver and master then hbase back to normal. >>> >>> Thanks >>> -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- >>> =E5=8F=91=E4=BB=B6=E4=BA=BA: saint.ack@gmail.com = [mailto:saint.ack@gmail.com] =E4=BB=A3=E8=A1=A8 Stack >>> =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: = 2014=E5=B9=B46=E6=9C=887=E6=97=A5 0:16 >>> =E6=94=B6=E4=BB=B6=E4=BA=BA: Hbase-User >>> =E4=B8=BB=E9=A2=98: Re: regionserver 60020 timeout >>> >>> On Thu, Jun 5, 2014 at 11:35 PM, sunweiwei = >>> wrote: >>> >>> > Hi >>> > >>> > I find some 60000 millis timeout in master log, when master = connect >>> > regionserver's 60020 port. Log like this: >>> > >>> > >>> Could we see the full stack trace? It would help confirm Qiang's >>> prognosis. Do you have monitoring of this server setup? What did = it >>> show >>> for loading, handler-count, at this time? >>> >>> Thanks, >>> St.Ack >>> >>> >>> >>> > because java.net.SocketTimeoutException: 60000 millis timeout = while >>> waiting >>> > for channel to be ready for read. ch : >>> > java.nio.channels.SocketChannel[connected = local=3D/132.228.248.61:34835 >>> > remote=3Ddsjhd074/132.228.248.84:60020] >>> > >>> > at >>> = org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475) >>> > >>> > at >>> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 650 >>> > ) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca= llB >>> > lockingMethod(RpcClient.java:1708) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$Block= ing >>> > Stub.openRegion(AdminProtos.java:20595) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager= .ja >>> > va:628) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager= .ja >>> > va:1901) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.master.AssignmentManager.access$300(AssignmentMan= age >>> > r.java:106) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentMana= ger >>> > .java:660) >>> > >>> > at >>> > >>> = org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) >>> > >>> > at >>> > >>> > >>> = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja >>> > va:886) >>> > >>> > at >>> > >>> > >>> = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 >>> > 08) >>> > >>> > at java.lang.Thread.run(Thread.java:662) >>> > >>> > >>> > >>> > And I find listener,port 60020 is waiting to lock = <0x000000041ae290d8> >>> in >>> > regionserver's jstack log. Is that ok? Thanks >>> > >>> > "RpcServer.reader=3D4,port=3D60020" daemon prio=3D10 = tid=3D0x00007fade53c0000 >>> > nid=3D0x22d8c runnable [0x00007fadb8b32000] >>> > >>> > java.lang.Thread.State: RUNNABLE >>> > >>> > at sun.nio.ch.FileDispatcher.read0(Native Method) >>> > >>> > at = sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >>> > >>> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >>> > >>> > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >>> > >>> > at >>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) >>> > >>> > - locked <0x000000041b1c78c0> (a java.lang.Object) >>> > >>> > at >>> > = org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402) >>> > >>> > at >>> > = org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer= .ja >>> > va:1476) >>> > >>> > at >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)= >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer= .ja >>> > va:563) >>> > >>> > - locked <0x000000041ae290d8> (a >>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:= 538 >>> > ) >>> > >>> > at >>> > >>> > >>> = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja >>> > va:886) >>> > >>> > at >>> > >>> > >>> = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 >>> > 08) >>> > >>> > at java.lang.Thread.run(Thread.java:662) >>> > >>> > >>> > >>> > "RpcServer.listener,port=3D60020" daemon prio=3D10 = tid=3D0x00007fadd4426800 >>> > nid=3D0x22db3 waiting for monitor entry [0x00007fadb658a000] >>> > >>> > java.lang.Thread.State: BLOCKED (on object monitor) >>> > >>> > at >>> > >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(Rpc= Ser >>> > ver.java:593) >>> > >>> > - waiting to lock <0x000000041ae290d8> (a >>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) >>> > >>> > at >>> > >>> = org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:74= 5) >>> > >>> > at >>> > = org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:668) >>> > >>> > >>> >>> >> >