hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HMaster restart with error
Date Sun, 17 May 2015 03:06:50 GMT
bq. the HMaster is handling two region server down, and not ready to handle
client request?

I didn't mean that - for a functioning master, handling region server
shutdown is part of the master's job.

You should see something similar to the following in (functioning) master
log:

2015-05-13 04:06:36,266 INFO  [master:c6401:60000] master.ServerManager:
Finished waiting for region servers count to settle; checked in 1, slept
for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is
running.

bq. wait the backend HMaster to take over

Was there exception in backup master log after it took over ?

On Sat, May 16, 2015 at 6:44 PM, Louis Hust <louis.hust@gmail.com> wrote:

> hi,ted,
>
> Thanks very much!
>
> Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22),
> just HMaster run on it。
> So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
> region server down, and not
> ready to handle client request? And how can i tell when the HMaster is
> ready to handle client request
> from the logs?
>
> I stop the Hmaster at 12:15:58 cause the HMaster can not handling request,
> so i want to stop it and wait
> the backend HMaster to take over.
>
>
>
>
> 2015-05-17 0:29 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>
> > In the period you identified, master was assigning regions.
> > e.g.
> >
> > 2015-05-15 12:13:09,683 INFO
> > [l-namenode2.dba.cn8.qunar.com
> ,60000,1431663090427-GeneralBulkAssigner-0]
> > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
> > state=OFFLINE, ts=1431663189621, server=null} to
> > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, ts=1431663189683,
> > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > 2015-05-15 12:13:09,683 INFO
> > [l-namenode2.dba.cn8.qunar.com
> ,60000,1431663090427-GeneralBulkAssigner-2]
> > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
> > state=OFFLINE, ts=1431663189621, server=null} to
> > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, ts=1431663189683,
> > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> >
> > Then two region servers went down:
> >
> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> >  1431663208899]
> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> >  1431663193865]
> >
> > Master was stopped afterwards:
> >
> > Fri May 15 12:15:58 CST 2015 Terminating master
> >
> > Namenode process was running on l-namenode2.dba.cn8, right ?
> >
> > Cheers
> >
> > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <louis.hust@gmail.com>
> wrote:
> >
> > > hi, TED,
> > > Any idea?
> > > When the HMaster restart, how can i know when it is really can handle
> > > request from application? is there any mark in logs?
> > >
> > > 2015-05-16 14:05 GMT+08:00 Louis Hust <louis.hust@gmail.com>:
> > >
> > > > @Ted,
> > > > plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster
> > is
> > > > in restarting stage, but can not handle request from client? Is the
> > > HMaster
> > > > recovering or do something else?
> > > >
> > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <louis.hust@gmail.com>:
> > > >
> > > >> OK, you can get the log from
> > > >> http://pan.baidu.com/s/1pqS6E
> > > >>
> > > >>
> > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> > > >>
> > > >>> Can you check server log on 192.168.39.22
> > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000>
?
> > > >>>
> > > >>> That should give you some clue.
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <louis.hust@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi all,
> > > >>> >
> > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > > >>> > and the custom said they can not write into hbase cluster,
> > > >>> > So i stop the HMaster and start it soon,
> > > >>> >
> > > >>> > But it seems that the HMaster not response to request, following
> is
> > > the
> > > >>> > HMaster log:
> > > >>> >
> > > >>> > {logs}
> > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > > >>> master.RegionStates:
> > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > > >>> ,60020,1431462615651}
> > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> ts=1431663213136,
> > > >>> server=
> > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > > >>> master.RegionStates:
> > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > > >>> l-hbase26.data.cn8.qunar.com
> > > >>> > ,60020,1431462615651
> > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > > >>> ,60020,1431663208899]
> > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > > >>> ,60020,1431663193865]
> > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > > >>> > client.HConnectionManager$HConnectionImplementation: Checking
> > master
> > > >>> > connection
> > > >>> > com.google.protobuf.ServiceException:
> > > java.net.SocketTimeoutException:
> > > >>> Call
> > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> > because
> > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> waiting
> > > for
> > > >>> > channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > > >>> > at
> > > >>> >
> > > >>>
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > > >>> > at
> > > >>>
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > >>> > at
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > >>> > at
> > > >>>
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > > >>> > at
> > > >>>
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > >>> > at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > >>> > at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
because
> > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> waiting
> > > for
> > > >>> > channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > > >>> > at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > > >>> > ... 37 more
> > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
timeout
> > > while
> > > >>> > waiting for channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > > >>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > >>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > > >>> > {/logs}
> > > >>> > So what the exception means? Why? and how to solve the problem?
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message