hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: Master timeout during list tables
Date Wed, 25 Jun 2014 19:09:36 GMT
Yes, that stack is still there:

   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
- locked <0x00000000bfa0a068> (a
org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
at
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)

What does that mean? That HDFS is behaving badly, or something else
entirely?

--Tom



On Wed, Jun 25, 2014 at 11:45 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Looks like master was stuck in FileSystem.listStatus() call.
> I noticed the following - did this show up if you take jstack one more time
> ?
>
>    1.         at
>
>  org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
>    2.         - locked <0x00000000bfa0a068> (a
>    org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
>    3.         at
>
>  org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
>    4.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
>    5.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
>    6.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
>    7.         at
>
>  org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)
>
>
>
> On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <tombrown52@gmail.com> wrote:
>
> > Before I was able to acquire a stack trace, I restarted the master.
> > However, the issue has just happened again and I was able to get a stack
> > trace:
> >
> > http://pastebin.com/Mz5c6AML
> >
> > (The pastebin is set to never expire, so anyone viewing an archived
> version
> > of this message should still be able to see the stack)
> >
> > The version of hbase is 0.94.10.
> >
> > Thanks!
> >
> > --Tom
> >
> >
> > On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <tianq01@gmail.com> wrote:
> >
> > > Hi Tom,
> > > Can you collect your master jvm stacktrace when problem happens and put
> > it
> > > to pastbin?
> > > what is your hbase version?
> > >
> > >
> > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <tombrown52@gmail.com>
> wrote:
> > >
> > > > Could this happen if the master is running too many RPC tasks and
> can't
> > > > keep up? What about if there's too many connections to the server?
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <tombrown52@gmail.com>
> > > wrote:
> > > >
> > > > > That server is the master and is not a regionserver.
> > > > >
> > > > > --Tom
> > > > >
> > > > >
> > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > >
> > > > >> Have you checked region server log on 10.100.101.221
> > > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000>
?
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >>
> > > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <tombrown52@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Hello all,
> > > > >> >
> > > > >> > I'm trying to view the master status of a 6 node (0.94.10;
> hadoop
> > > > 1.1.2)
> > > > >> > cluster but I keep getting a timeout exception.
> > > > >> >
> > > > >> > The rest of the cluster is operating quite normally. From
the
> > > > >> exception, it
> > > > >> > seems like the "list tables" function (required to display
the
> web
> > > UI)
> > > > >> is
> > > > >> > timing out for some reason.
> > > > >> >
> > > > >> > From the shell, I'm able to scan the entire .META. table,
so the
> > > table
> > > > >> > information is conceivably available. I don't understand
the
> rest
> > of
> > > > the
> > > > >> > architecture well enough to know what might be causing this
> > timeout
> > > > >> during
> > > > >> > "list".
> > > > >> >
> > > > >> > Any suggestions?
> > > > >> >
> > > > >> > java.net.SocketTimeoutException: Call to
> > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed
on
> > > socket
> > > > >> > timeout exception: java.net.SocketTimeoutException: 60000
millis
> > > > >> > timeout while waiting for channel to be ready for read.
ch :
> > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > 10.100.101.221:36722
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > > >> >         at
> > > > >> >
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > > >> >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > > >> >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > > >> >         at
> > > > >> >
> > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > >> >         at
> > > > >> >
> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > >> >         at
> > > > >> >
> > > >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > >> >         at
> > > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > >> >         at
> > > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > >> >         at
> > > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis
timeout
> > > while
> > > > >> > waiting for channel to be ready for read. ch :
> > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > 10.100.101.221:36722
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > >> >         at
> > > > >> >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > > >> >         at
> > > > >> >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > The master log file is unhelpful. Almost all of it is notices
> > about
> > > > >> > skipping load balancing, but at least the exception appears
in
> the
> > > > >> > log:
> > > > >> >
> > > > >> >
> > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333
> > mos
> > > > >> > tloaded=4 leastloaded=3
> > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667
> > mos
> > > > >> > tloaded=1 leastloaded=0
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333
> > mos
> > > > >> > tloaded=3 leastloaded=2
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> > mostloade
> > > > >> > d=2 leastloaded=2
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=68 average=11.333333
> > mos
> > > > >> > tloaded=12 leastloaded=11
> > > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > > > >> > java.net.SocketTimeoutException: Call to
> > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed
on s
> > > > >> > ocket timeout exception: java.net.SocketTimeoutException:
60000
> > > millis
> > > > >> > timeout while waiting for channel to be ready for read.
ch :
> > > > >> > java.nio.channels.SocketChannel[co
> > > > >> > nnected local=/10.100.101.221:36674
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --Tom
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message