Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3DC6E1766A for ; Sun, 17 May 2015 01:44:57 +0000 (UTC) Received: (qmail 4474 invoked by uid 500); 17 May 2015 01:44:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 4411 invoked by uid 500); 17 May 2015 01:44:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 4399 invoked by uid 99); 17 May 2015 01:44:54 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 May 2015 01:44:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 62899C55EE for ; Sun, 17 May 2015 01:44:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.131 X-Spam-Level: *** X-Spam-Status: No, score=3.131 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id nr1TmQxHVt5a for ; Sun, 17 May 2015 01:44:47 +0000 (UTC) Received: from mail-ob0-f169.google.com (mail-ob0-f169.google.com [209.85.214.169]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 2F6B520D7C for ; Sun, 17 May 2015 01:44:46 +0000 (UTC) Received: by obfe9 with SMTP id e9so101705398obf.1 for ; Sat, 16 May 2015 18:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+4cq70QVljRI05UOqFbPu7W3Zimnf8u+HRv/Ci2C/oI=; b=jyjEe3b0KsmJODcj7QSJbg9Z1u0r6eXkuD/mZtmOJnxnEmOO+hqBkXKGJprLG//zjm 12ArdQMQAWac+AS5/ZXWvXFjj9ju3FxgKb3Oil1QH83JKs2V5hE9MGcDc6VmfkhzyjsS uRL5Tc0ZZE8Whw9gBlZL7ndba902Dvdr7UKAy5iGe/8UKrVQGkpuXPDygveJ+C0zvn/m X96oFpz4CingXTDtZR2/Jwnri+q1f6YyyAvsnglrksljFS5f35pku+7tBqap5xdF6ITH 2twkSAf1vPQxPRG09c82alfo2xLMwcwxPXd+kHBWnxvCYDVot1QTqeiuCiFYzWCKeo3m XmiQ== MIME-Version: 1.0 X-Received: by 10.202.180.139 with SMTP id d133mr4071167oif.104.1431827084889; Sat, 16 May 2015 18:44:44 -0700 (PDT) Received: by 10.202.197.66 with HTTP; Sat, 16 May 2015 18:44:44 -0700 (PDT) In-Reply-To: References: Date: Sun, 17 May 2015 09:44:44 +0800 Message-ID: Subject: Re: HMaster restart with error From: Louis Hust To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113ce2dcc651f505163d3854 --001a113ce2dcc651f505163d3854 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable hi,ted, Thanks very much! Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22), just HMaster run on it=E3=80=82 So you means that at 2015-05-15 12:15:04, the HMaster is handling two region server down, and not ready to handle client request? And how can i tell when the HMaster is ready to handle client request from the logs? I stop the Hmaster at 12:15:58 cause the HMaster can not handling request, so i want to stop it and wait the backend HMaster to take over. 2015-05-17 0:29 GMT+08:00 Ted Yu : > In the period you identified, master was assigning regions. > e.g. > > 2015-05-15 12:13:09,683 INFO > [l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-0] > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685 > state=3DOFFLINE, ts=3D1431663189621, server=3Dnull} to > {c634280ce287b2d6cebd88b61accf685 state=3DPENDING_OPEN, ts=3D143166318968= 3, > server=3Dl-hbase26.data.cn8.qunar.com,60020,1431462615651} > 2015-05-15 12:13:09,683 INFO > [l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-2] > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565 > state=3DOFFLINE, ts=3D1431663189621, server=3Dnull} to > {2f60b1b4e51d32ef98ad19690f13a565 state=3DPENDING_OPEN, ts=3D143166318968= 3, > server=3Dl-hbase30.data.cn8.qunar.com,60020,1431462562233} > > Then two region servers went down: > > 2015-05-15 12:14:40,699 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [l-hbase27.data.cn8.qunar.com,60020, > 1431663208899] > 2015-05-15 12:15:04,899 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [l-hbase25.data.cn8.qunar.com,60020, > 1431663193865] > > Master was stopped afterwards: > > Fri May 15 12:15:58 CST 2015 Terminating master > > Namenode process was running on l-namenode2.dba.cn8, right ? > > Cheers > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust wrote: > > > hi, TED, > > Any idea? > > When the HMaster restart, how can i know when it is really can handle > > request from application? is there any mark in logs? > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust : > > > > > @Ted, > > > plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster > is > > > in restarting stage, but can not handle request from client? Is the > > HMaster > > > recovering or do something else? > > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust : > > > > > >> OK, you can get the log from > > >> http://pan.baidu.com/s/1pqS6E > > >> > > >> > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu : > > >> > > >>> Can you check server log on 192.168.39.22 > > >>> ? > > >>> > > >>> That should give you some clue. > > >>> > > >>> Cheers > > >>> > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust > > >>> wrote: > > >>> > > >>> > Hi all, > > >>> > > > >>> > I use hbase0.96.0 with hadoop 2.2.0, > > >>> > and the custom said they can not write into hbase cluster, > > >>> > So i stop the HMaster and start it soon, > > >>> > > > >>> > But it seems that the HMaster not response to request, following = is > > the > > >>> > HMaster log: > > >>> > > > >>> > {logs} > > >>> > 2015-05-15 12:13:33,136 INFO [AM.ZK.Worker-pool2-t16] > > >>> master.RegionStates: > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=3DOPENING, > > >>> > ts=3D1431663212637, server=3Dl-hbase26.data.cn8.qunar.com > > >>> ,60020,1431462615651} > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=3DOPEN, ts=3D143166321= 3136, > > >>> server=3D > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651} > > >>> > 2015-05-15 12:13:33,139 INFO [AM.ZK.Worker-pool2-t4] > > >>> master.RegionStates: > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on > > >>> l-hbase26.data.cn8.qunar.com > > >>> > ,60020,1431462615651 > > >>> > 2015-05-15 12:14:40,699 INFO [main-EventThread] > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node delete= d, > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com > > >>> ,60020,1431663208899] > > >>> > 2015-05-15 12:15:04,899 INFO [main-EventThread] > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node delete= d, > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com > > >>> ,60020,1431663193865] > > >>> > 2015-05-15 12:15:24,465 WARN [249240421@qtp-591022857-33] > > >>> > client.HConnectionManager$HConnectionImplementation: Checking > master > > >>> > connection > > >>> > com.google.protobuf.ServiceException: > > java.net.SocketTimeoutException: > > >>> Call > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed > because > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiti= ng > > for > > >>> > channel to be ready for read. ch : > > >>> > java.nio.channels.SocketChannel[connected local=3D/ > 192.168.39.22:47700 > > >>> > remote=3D > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 667) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca= llBlockingMethod(RpcClient.java:1708) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$Blo= ckingStub.isMasterRunning(MasterProtos.java:40216) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati= on$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati= on.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati= on.getKeepAliveMasterService(HConnectionManager.java:1836) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati= on.listTables(HConnectionManager.java:2531) > > >>> > at > > >>> > > > >>> > > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:29= 8) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUni= t__userTables(MasterStatusTmplImpl.java:530) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(Ma= sterStatusTmplImpl.java:255) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(Master= StatusTmpl.java:382) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusT= mpl.java:372) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServ= let.java:95) > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) > > >>> > at > > >>> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1221) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(= StaticUserWebFilter.java:109) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1212) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.= java:1081) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1212) > > >>> > at > > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1212) > > >>> > at > > >>> > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399= ) > > >>> > at > > >>> > > > >>> > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:21= 6) > > >>> > at > > >>> > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182= ) > > >>> > at > > >>> > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766= ) > > >>> > at > > >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:45= 0) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerC= ollection.java:230) > > >>> > at > > >>> > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152= ) > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326) > > >>> > at > > >>> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnec= tion.java:928) > > >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > > >>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:21= 2) > > >>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:40= 4) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:4= 10) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:= 582) > > >>> > Caused by: java.net.SocketTimeoutException: Call to > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiti= ng > > for > > >>> > channel to be ready for read. ch : > > >>> > java.nio.channels.SocketChannel[connected local=3D/ > 192.168.39.22:47700 > > >>> > remote=3D > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >>> > at > > >>> > > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475= ) > > >>> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450= ) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 650) > > >>> > ... 37 more > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout > > while > > >>> > waiting for channel to be ready for read. ch : > > >>> > java.nio.channels.SocketChannel[connected local=3D/ > 192.168.39.22:47700 > > >>> > remote=3D > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:1= 64) > > >>> > at > > >>> > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161= ) > > >>> > at > > >>> > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131= ) > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133) > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(Rpc= Client.java:553) > > >>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > > >>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.j= ava:1057) > > >>> > at > > >>> > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719= ) > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master > > >>> > {/logs} > > >>> > So what the exception means? Why? and how to solve the problem? > > >>> > > > >>> > > >> > > >> > > > > > > --001a113ce2dcc651f505163d3854--