hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HRegionservers aborting after rolling logs
Date Wed, 23 Jul 2008 22:29:52 GMT
This is whats doing it in:

2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We 
slept 410869ms, ten times longer than scheduled: 10000
2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We 
slept 402421ms, ten times longer than scheduled: 3000

If regionserver doesn't report the master within 
hbase.master.lease.period (60 seconds by default), master presumes it 
dead and assigns its regions elsewhere. If the regionserver hasn't been 
able to signal the master in that time, it shuts itself down to be 
safe.  You could try upping the default lease periods.

Whats the load on your machines?  It looks to be so 'high', java 
schedulers not getting any slices.  You're running datanodes, 
regionservers and tasktrackers all on the one node?  What kind of a node 
is it?  You running RAID by any chance?  Is this soon after your MR job 
starts?  How many mappers are loading HBase at one time?  How many 
regions do you have?  If soon after you start up, perhaps all mappers 
are going against a single region.  Can you reduce the number of mappers 
and see if life is better?

Thanks,
St.Ack




William Clay Moody wrote:
> I am having problems with my HRegionServers aborting following rolling 
> their logs and thus not being able to report to the HMaster for a 
> while. Eventually all 5 HRegionservers have exited and my MR which is 
> writing to HBase fails.
>
> Reviewing the logs (with debugging on), I believe what is happening is 
> the Memcache goes negative and compaction and rolling logs begins. 
> After this is complete, the logs shows a FATAL error about not being 
> able to report to the Master and then it exits. It takes about 45 mins 
> to complete the slow death across all the HRservers
>
> I have pasted the logs of one of the HRegionServers (omitting repeated 
> log rolling reports, null pointer exception for closed regions and 
> region closing messages.) Can someone point out what might be the 
> problem? Thanks in advance,
>
> Clay
>
>
>
> 2008-07-23 16:20:32,991 WARN org.apache.hadoop.hbase.HRegion: Memcache 
> size went negative -59248835; resetting
> 2008-07-23 16:20:33,003 DEBUG org.apache.hadoop.hbase.HRegion: 
> Finished memcache flush for region 
> anansi,http://org.himssconference.www/,1216843897872 in 16814ms, 
> sequence id=106628188
> 2008-07-23 16:20:33,003 DEBUG org.apache.hadoop.hbase.HRegionServer: 
> Compaction requested for region: 
> anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:20:33,004 INFO org.apache.hadoop.hbase.HRegion: checking 
> compaction on region anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:20:33,032 INFO org.apache.hadoop.hbase.HRegion: checking 
> compaction completed on region 
> anansi,http://org.himssconference.www/,1216843897872; status: false;
> 2008-07-23 16:26:09,289 DEBUG org.apache.hadoop.hbase.HLog: Closing 
> current log writer /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.060
> 2008-07-23 16:26:09,298 INFO org.apache.hadoop.hbase.HLog: New log 
> writer created at /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.061
> 2008-07-23 16:26:14,494 INFO org.apache.hadoop.hbase.HRegionServer: 
> Rolling hlog. Number of entries: 30001
> 2008-07-23 16:26:14,538 DEBUG org.apache.hadoop.hbase.HLog: Closing 
> current log writer /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.061
> 2008-07-23 16:26:14,548 INFO org.apache.hadoop.hbase.HLog: New log 
> writer created at /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.062
> < omitted >
> 2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We 
> slept 410869ms, ten times longer than scheduled: 10000
> 2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We 
> slept 402421ms, ten times longer than scheduled: 3000
> 2008-07-23 16:32:59,060 FATAL org.apache.hadoop.hbase.HRegionServer: 
> unable to report to master for 402861 milliseconds - aborting server
> 2008-07-23 16:32:59,452 INFO org.apache.hadoop.hbase.Leases: 
> regionserver/0.0.0.0:60020 closing leases
> 2008-07-23 16:33:00,294 INFO 
> org.apache.hadoop.hbase.Leases$LeaseMonitor: 
> regionserver/0.0.0.0:60020.leaseChecker exiting
> 2008-07-23 16:33:00,534 INFO org.apache.hadoop.hbase.Leases: 
> regionserver/0.0.0.0:60020 closed leases
> 2008-07-23 16:33:00,534 INFO org.apache.hadoop.ipc.Server: Stopping 
> server on 60020
> 2008-07-23 16:33:03,421 INFO org.apache.hadoop.ipc.Server: Stopping 
> IPC Server listener on 60020
> 2008-07-23 16:33:03,422 INFO org.apache.hadoop.ipc.Server: Stopping 
> IPC Server Responder
> 2008-07-23 16:33:03,437 INFO org.apache.hadoop.hbase.HRegionServer: 
> Stopping infoServer
> 2008-07-23 16:33:03,537 INFO org.mortbay.util.ThreadedServer: Stopping 
> Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-07-23 16:33:04,114 INFO org.mortbay.http.SocketListener: Stopped 
> SocketListener on 0.0.0.0:60030
> 2008-07-23 16:34:59,945 INFO org.mortbay.util.Container: Stopped 
> HttpContext[/static,/static]
> 2008-07-23 16:34:59,945 INFO org.apache.hadoop.hbase.HRegionServer: 
> worker thread exiting
> 2008-07-23 16:35:15,596 INFO org.apache.hadoop.hbase.HRegionServer: 
> regionserver/0.0.0.0:60020.compactor exiting
> 2008-07-23 16:35:15,715 INFO org.mortbay.util.Container: Stopped 
> HttpContext[/logs,/logs]
> 2008-07-23 16:35:15,729 INFO org.apache.hadoop.hbase.HRegionServer: 
> regionserver/0.0.0.0:60020.cacheFlusher exiting
> 2008-07-23 16:35:15,948 INFO org.mortbay.util.Container: Stopped 
> org.mortbay.jetty.servlet.WebApplicationHandler@e0d5eb7
> 2008-07-23 16:35:17,528 INFO org.mortbay.util.Container: Stopped 
> WebApplicationContext[/,/]
> 2008-07-23 16:35:17,544 INFO org.mortbay.util.Container: Stopped 
> org.mortbay.jetty.Server@688a548b
> 2008-07-23 16:35:17,544 DEBUG org.apache.hadoop.hbase.HLog: closing 
> log writer in hdfs://namenode:4010/hbase/log_10.1.2.8_1216843622025_60020
> 2008-07-23 16:35:17,773 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 0 on 60020, call 
> batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773, 
> 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@6b9ae493) 
> from 10.1.2.7:52761: error: java.io.IOException: Server not running
> java.io.IOException: Server not running
>     at 
> org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1758)
>     at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1547) 
>
>     at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
> 2008-07-23 16:35:17,802 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 7 on 60020, call 
> batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773, 
> 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@3c2250d5) 
> from 10.1.2.8:49245: error: java.io.IOException: Server not running
> java.io.IOException: Server not running
>     at 
> org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1758)
>     at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1547) 
>
>     at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
> 2
>
> 2008-07-23 16:35:18,735 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 7 on 60020: exiting
> 2008-07-23 16:35:18,735 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 4 on 60020 caught: java.lang.NullPointerException
>     at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
>     at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
>     at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:646)
>     at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
>
> 2008-07-23 16:35:18,736 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 4 on 60020: exiting
> 2008-07-23 16:35:18,774 WARN org.apache.hadoop.ipc.Server: IPC Server 
> Responder, call 
> batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773, 
> 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@5d18a770) 
> from 10.1.2.8:49245: output error
> 2008-07-23 16:35:18,774 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 9 on 60020 caught: java.lang.NullPointerException
>     at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
>     at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
>     at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:646)
>     at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
>
> 2008-07-23 16:35:18,774 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 9 on 60020: exiting
> 2008-07-23 16:35:18,791 WARN org.apache.hadoop.ipc.Server: IPC Server 
> Responder, call 
> batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773, 
> 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@74e51bda) 
> from 10.1.2.4:43896: output error
> 2008-07-23 16:35:18,808 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 3 on 60020 caught: java.lang.NullPointerException
>     at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
>     at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
>     at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:646)
>     at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
>
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

>
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

>
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>     at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

>
>     at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)

>
>     at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:873)
>     at java.lang.Thread.run(Thread.java:619)
>
>
> 2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegionServer: 
> closing region anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion: 
> compactions and cache flushes disabled for region 
> anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion: new 
> updates and scanners for region 
> anansi,http://org.himssconference.www/,1216843897872 disabled
> 2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion: no more 
> active scanners for region 
> anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:35:19,239 DEBUG org.apache.hadoop.hbase.HRegion: no more 
> row locks outstanding on region 
> anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:35:19,239 DEBUG org.apache.hadoop.hbase.HStore: closed 
> 1854090017/out_edge
> 2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed 
> 1854090017/attrib
> 2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed 
> 1854090017/in_edge
> 2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed 
> 1854090017/update
> 2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegion: closed 
> anansi,http://org.himssconference.www/,1216843897872
> 2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegionServer: 
> aborting server at: 10.1.2.8:60020
> 2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegionServer: 
> regionserver/0.0.0.0:60020 exiting
> 2008-07-23 16:35:19,585 INFO org.apache.hadoop.hbase.HRegionServer: 
> Starting shutdown thread.
> 2008-07-23 16:35:19,585 INFO org.apache.hadoop.hbase.HRegionServer: 
> Shutdown thread complete
>
>
> William Clay Moody
> wcmoody@ncsu.edu
>
>
>


Mime
View raw message