hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jieshan Bean (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started
Date Mon, 01 Apr 2013 01:47:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618549#comment-13618549
] 

Jieshan Bean commented on HBASE-8230:
-------------------------------------

bq.Did the failure happen when region server restarted ?
Yes.

bq.If this was repeatable, I would suggest finding the root cause.
The root cause in our env was NameNode was in safemode:
{noformat}
2013-03-29 10:32:42,260 FATAL [regionserver26003] ABORTING region server om-host2,26003,1364524173470:
Unhandled exception: cannot get log writer org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1737)
java.io.IOException: cannot get log writer
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:757)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:701)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:637)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:582)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:436)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:362)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1327)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1316)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1030)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:706)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create file/hbase/.logs/om-host2,26003,1364524173470/om-host2%2C26003%2C1364524173470.1364524361366.
Name node is in safe mode.
The reported blocks 14 has reached the threshold 0.9990 of total blocks 14. Safe mode will
be turned off automatically in 21 seconds.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1601)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1547)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:412)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:204)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43664)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1704)

	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:209)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:754)
	... 10 more
{noformat}

                
> Possible NPE on regionserver abort if replication service has not been started
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-8230
>                 URL: https://issues.apache.org/jira/browse/HBASE-8230
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Replication
>    Affects Versions: 0.94.6
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>         Attachments: HBASE-8230-94.patch
>
>
> RegionServer got Exception on calling setupWALAndReplication, so entered abort flow.
Since replicationSink had not been inialized yet, we got below exception:
> {noformat}
> Exception in thread "regionserver26003" java.lang.NullPointerException
>  at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
>  at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
>  at org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
>  at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
>  at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message