Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 162F117438 for ; Mon, 18 May 2015 03:46:23 +0000 (UTC) Received: (qmail 36989 invoked by uid 500); 18 May 2015 03:46:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 36930 invoked by uid 500); 18 May 2015 03:46:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 36918 invoked by uid 99); 18 May 2015 03:46:14 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 May 2015 03:46:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B65E11A2E4C for ; Mon, 18 May 2015 03:46:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.151 X-Spam-Level: *** X-Spam-Status: No, score=3.151 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FAGjhVGoJeke for ; Mon, 18 May 2015 03:46:04 +0000 (UTC) Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id A4D9624B1E for ; Mon, 18 May 2015 03:46:03 +0000 (UTC) Received: by obfe9 with SMTP id e9so113586123obf.1 for ; Sun, 17 May 2015 20:45:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=q4/uj/636KAF52bV9MfUSgdvEeuQ/SyNF5KN/qQhOR8=; b=l/YKxMPOr4Lcz2xHIh4stZ5b0BvkqaMmqiPc0yn0FZ5MKSZogpbwRMKbSLNvldHyAf 3u8h5URft181sAVsipxEIu2g/VjadsSy9fMRMDpkl+E4Jz6g+t7ajnu9IW9PC3B5a+P9 3fMSKh4f25R8bhAWuYLfEdPfpyAwnJs6jPFxJnf9I/Ac0Vch3eMvll0ric+rh2I32K/g mZj2/GCgKBlVB0BlQxBiohu5aEErrUgVCH1VM8D7uiyvh4437rQvFyyw4pqETaFP1db3 Gvpungea6m82p63R691Eyq7WrIjXLENsw2GjsVVzeThMA5vmp3HZEXesOAZT5QllNANQ EQdQ== MIME-Version: 1.0 X-Received: by 10.202.193.8 with SMTP id r8mr17209123oif.27.1431920717372; Sun, 17 May 2015 20:45:17 -0700 (PDT) Received: by 10.202.177.2 with HTTP; Sun, 17 May 2015 20:45:17 -0700 (PDT) In-Reply-To: References: Date: Sun, 17 May 2015 20:45:17 -0700 Message-ID: Subject: Re: HMaster restart with error From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113cd73cb4a276051653059c --001a113cd73cb4a276051653059c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable After l-namenode1 became active master , it assigned regions: 2015-05-15 12:16:40,432 INFO [master:l-namenode1:60000] master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff state=3DOFFLINE, ts=3D1431663400432, server=3Dnull} to {6f806bb62b347c992cd243fc909276ff state=3DOPEN, ts=3D1431663400432, server= =3D l-hbase31.data.cn8.qunar.com,60020,1431462584879} However, l-hbase31 went down: 2015-05-15 12:16:40,508 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0] handler.ServerShutdownHandler: Splitting logs for l-hbase31.data.cn8.qunar.com,60020,1427789773001 before assignment. l-namenode1 was restarted : 2015-05-15 12:20:25,322 INFO [main] util.VersionInfo: HBase 0.96.0-hadoop2 2015-05-15 12:20:25,323 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434 However, it went down due to zookeeper session expiration: 2015-05-15 12:20:25,580 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =3D Session expired for /hbase/master It started again after that and AssignmentManager did a lot of assignments. Looks like the cluster was operational this time. Cheers On Sun, May 17, 2015 at 8:24 AM, Ted Yu wrote: > bq. the backup master take over at 2015-05-15 12:16:40,024 ? > > The switch of active master should be earlier than 12:16:40,024 - shortly > after 12:15:58 > > l-namenode1 would do some initialization (such as waiting for region > servers count to settle) after it became active master. > > I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but > the download progress was very slow. > > Will try downloading later in the day. > > Do you have access to pastebin ? > > Cheers > > On Sun, May 17, 2015 at 2:07 AM, Louis Hust wrote: > >> Hi, ted, >> >> Thanks for your reply!! >> >> I found the log in l-namenode2.dba.cn8 during the restarting progress: >> 2015-05-15 12:11:36,540 INFO [master:l-namenode2:60000] >> master.ServerManager: Finished waiting for region servers count to settl= e; >> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of >> 2147483647, master is running. >> >> So this means the HMaster ready for handle request at 12:11:36? >> >> The backup master is l-namenode1.dba.cn8 and you can get the log at : >> >> http://pan.baidu.com/s/1eQlKXj0 >> >> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58, >> the backup master l-namenode1 take over, and i found log: >> >> 2015-05-15 12:16:40,024 INFO [master:l-namenode1:60000] >> master.ServerManager: Finished waiting for region servers count to settl= e; >> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of >> 2147483647, master is running. >> >> So the backup master take over at 2015-05-15 12:16:40,024 ? >> >> But it seems the l-namenode2 not working normally with the exception in >> log: >> >> 2015-05-15 12:16:40,522 INFO >> [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0] >> handler.ServerShutdownHandler: Finished processing of shutdown of >> l-hbase31.data.cn8.qunar.com,60020,1427789773001 >> 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] >> client.HConnectionManager$HConnectionImplementation: Checking master >> connection >> com.google.protobuf.ServiceException: java.net.ConnectException: >> Connection >> refused >> at >> >> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:= 1667) >> at >> >> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.c= allBlockingMethod(RpcClient.java:1708) >> at >> >> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$Bl= ockingStub.isMasterRunning(MasterProtos.java:40216) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.getKeepAliveMasterService(HConnectionManager.java:1836) >> >> Is the exception means the HMaster not working normally or somewhat? >> >> >> >> 2015-05-17 11:06 GMT+08:00 Ted Yu : >> >> > bq. the HMaster is handling two region server down, and not ready to >> handle >> > client request? >> > >> > I didn't mean that - for a functioning master, handling region server >> > shutdown is part of the master's job. >> > >> > You should see something similar to the following in (functioning) >> master >> > log: >> > >> > 2015-05-13 04:06:36,266 INFO [master:c6401:60000] master.ServerManage= r: >> > Finished waiting for region servers count to settle; checked in 1, sle= pt >> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is >> > running. >> > >> > bq. wait the backend HMaster to take over >> > >> > Was there exception in backup master log after it took over ? >> > >> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust >> wrote: >> > >> > > hi,ted, >> > > >> > > Thanks very much! >> > > >> > > Namenode process was not running on >> l-namenode2.dba.cn8(192.168.39.22), >> > > just HMaster run on it=E3=80=82 >> > > So you means that at 2015-05-15 12:15:04, the HMaster is handling t= wo >> > > region server down, and not >> > > ready to handle client request? And how can i tell when the HMaster = is >> > > ready to handle client request >> > > from the logs? >> > > >> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling >> > request, >> > > so i want to stop it and wait >> > > the backend HMaster to take over. >> > > >> > > >> > > >> > > >> > > 2015-05-17 0:29 GMT+08:00 Ted Yu : >> > > >> > > > In the period you identified, master was assigning regions. >> > > > e.g. >> > > > >> > > > 2015-05-15 12:13:09,683 INFO >> > > > [l-namenode2.dba.cn8.qunar.com >> > > ,60000,1431663090427-GeneralBulkAssigner-0] >> > > > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf68= 5 >> > > > state=3DOFFLINE, ts=3D1431663189621, server=3Dnull} to >> > > > {c634280ce287b2d6cebd88b61accf685 state=3DPENDING_OPEN, >> ts=3D1431663189683, >> > > > server=3Dl-hbase26.data.cn8.qunar.com,60020,1431462615651} >> > > > 2015-05-15 12:13:09,683 INFO >> > > > [l-namenode2.dba.cn8.qunar.com >> > > ,60000,1431663090427-GeneralBulkAssigner-2] >> > > > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a56= 5 >> > > > state=3DOFFLINE, ts=3D1431663189621, server=3Dnull} to >> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=3DPENDING_OPEN, >> ts=3D1431663189683, >> > > > server=3Dl-hbase30.data.cn8.qunar.com,60020,1431462562233} >> > > > >> > > > Then two region servers went down: >> > > > >> > > > 2015-05-15 12:14:40,699 INFO [main-EventThread] >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted= , >> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020, >> > > > 1431663208899] >> > > > 2015-05-15 12:15:04,899 INFO [main-EventThread] >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted= , >> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020, >> > > > 1431663193865] >> > > > >> > > > Master was stopped afterwards: >> > > > >> > > > Fri May 15 12:15:58 CST 2015 Terminating master >> > > > >> > > > Namenode process was running on l-namenode2.dba.cn8, right ? >> > > > >> > > > Cheers >> > > > >> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust >> > > wrote: >> > > > >> > > > > hi, TED, >> > > > > Any idea? >> > > > > When the HMaster restart, how can i know when it is really can >> handle >> > > > > request from application? is there any mark in logs? >> > > > > >> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust : >> > > > > >> > > > > > @Ted, >> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange the >> > HMaster >> > > > is >> > > > > > in restarting stage, but can not handle request from client? I= s >> the >> > > > > HMaster >> > > > > > recovering or do something else? >> > > > > > >> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust : >> > > > > > >> > > > > >> OK, you can get the log from >> > > > > >> http://pan.baidu.com/s/1pqS6E >> > > > > >> >> > > > > >> >> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu : >> > > > > >> >> > > > > >>> Can you check server log on 192.168.39.22 >> > > > > >>> ? >> > > > > >>> >> > > > > >>> That should give you some clue. >> > > > > >>> >> > > > > >>> Cheers >> > > > > >>> >> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust < >> > louis.hust@gmail.com> >> > > > > >>> wrote: >> > > > > >>> >> > > > > >>> > Hi all, >> > > > > >>> > >> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0, >> > > > > >>> > and the custom said they can not write into hbase cluster, >> > > > > >>> > So i stop the HMaster and start it soon, >> > > > > >>> > >> > > > > >>> > But it seems that the HMaster not response to request, >> > following >> > > is >> > > > > the >> > > > > >>> > HMaster log: >> > > > > >>> > >> > > > > >>> > {logs} >> > > > > >>> > 2015-05-15 12:13:33,136 INFO [AM.ZK.Worker-pool2-t16] >> > > > > >>> master.RegionStates: >> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa >> state=3DOPENING, >> > > > > >>> > ts=3D1431663212637, server=3Dl-hbase26.data.cn8.qunar.com >> > > > > >>> ,60020,1431462615651} >> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=3DOPEN, >> > > ts=3D1431663213136, >> > > > > >>> server=3D >> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651} >> > > > > >>> > 2015-05-15 12:13:33,139 INFO [AM.ZK.Worker-pool2-t4] >> > > > > >>> master.RegionStates: >> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on >> > > > > >>> l-hbase26.data.cn8.qunar.com >> > > > > >>> > ,60020,1431462615651 >> > > > > >>> > 2015-05-15 12:14:40,699 INFO [main-EventThread] >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node >> > > deleted, >> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com >> > > > > >>> ,60020,1431663208899] >> > > > > >>> > 2015-05-15 12:15:04,899 INFO [main-EventThread] >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node >> > > deleted, >> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com >> > > > > >>> ,60020,1431663193865] >> > > > > >>> > 2015-05-15 12:15:24,465 WARN [249240421@qtp-591022857-33] >> > > > > >>> > client.HConnectionManager$HConnectionImplementation: >> Checking >> > > > master >> > > > > >>> > connection >> > > > > >>> > com.google.protobuf.ServiceException: >> > > > > java.net.SocketTimeoutException: >> > > > > >>> Call >> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 faile= d >> > > > because >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout whil= e >> > > waiting >> > > > > for >> > > > > >>> > channel to be ready for read. ch : >> > > > > >>> > java.nio.channels.SocketChannel[connected local=3D/ >> > > > 192.168.39.22:47700 >> > > > > >>> > remote=3D >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:= 1667) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.c= allBlockingMethod(RpcClient.java:1708) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$Bl= ockingStub.isMasterRunning(MasterProtos.java:40216) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.getKeepAliveMasterService(HConnectionManager.java:1836) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.listTables(HConnectionManager.java:2531) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> >> > > > > >> > > >> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298= ) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUn= it__userTables(MasterStatusTmplImpl.java:530) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(M= asterStatusTmplImpl.java:255) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(Maste= rStatusTmpl.java:382) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatus= Tmpl.java:372) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusSer= vlet.java:95) >> > > > > >>> > at >> javax.servlet.http.HttpServlet.service(HttpServlet.java:734) >> > > > > >>> > at >> javax.servlet.http.HttpServlet.service(HttpServlet.java:847) >> > > > > >>> > at >> > > > > >>> >> > > > >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan= dler.java:1221) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter= (StaticUserWebFilter.java:109) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan= dler.java:1212) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer= .java:1081) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan= dler.java:1212) >> > > > > >>> > at >> > > > > >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan= dler.java:1212) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2= 16) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) >> > > > > >>> > at >> > > > > >>> >> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450= ) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler= Collection.java:230) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326) >> > > > > >>> > at >> > > > > >>> >> > > > >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne= ction.java:928) >> > > > > >>> > at >> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) >> > > > > >>> > at >> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) >> > > > > >>> > at >> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:= 410) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java= :582) >> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed >> > because >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout whil= e >> > > waiting >> > > > > for >> > > > > >>> > channel to be ready for read. ch : >> > > > > >>> > java.nio.channels.SocketChannel[connected local=3D/ >> > > > 192.168.39.22:47700 >> > > > > >>> > remote=3D >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475) >> > > > > >>> > at >> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:= 1650) >> > > > > >>> > ... 37 more >> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis >> > timeout >> > > > > while >> > > > > >>> > waiting for channel to be ready for read. ch : >> > > > > >>> > java.nio.channels.SocketChannel[connected local=3D/ >> > > > 192.168.39.22:47700 >> > > > > >>> > remote=3D >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:= 164) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) >> > > > > >>> > at >> java.io.FilterInputStream.read(FilterInputStream.java:133) >> > > > > >>> > at >> java.io.FilterInputStream.read(FilterInputStream.java:133) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(Rp= cClient.java:553) >> > > > > >>> > at >> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235) >> > > > > >>> > at >> > java.io.BufferedInputStream.read(BufferedInputStream.java:254) >> > > > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:38= 7) >> > > > > >>> > at >> > > > > >>> > >> > > > > >>> > >> > > > > >>> >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.= java:1057) >> > > > > >>> > at >> > > > > >>> >> > > > > >> > > >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719) >> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master >> > > > > >>> > {/logs} >> > > > > >>> > So what the exception means? Why? and how to solve the >> problem? >> > > > > >>> > >> > > > > >>> >> > > > > >> >> > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> > > --001a113cd73cb4a276051653059c--