hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shen <zhengshe...@outlook.com>
Subject Re: Re: Could not initialize all stores for the region
Date Thu, 31 Mar 2016 16:57:35 GMT
Hi Ted,

Thank you very much for your reply!

We do have mutliple HMaster nodes, one of them is on the offline node (let's call it nodeA).
Another is on node which is alwasy online (nodeB).

I scanned the audit log, and found that during nodeA offline, the nodeB HDFS auditlog shows:

hdfs-audit.log:2016-03-31 19:19:24,158 INFO FSNamesystem.audit: allowed=true ugi=hbase (auth:SIMPLE)
ip=/ cmd=delete src=/hbase/archive/data/default/vocabulary/2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
dst=null perm=null proto=rpc

where ( is the IP of nodeB.

So it looks like nodeB deleted this file during nodeA's offline. However, should'nt services
on nodeA (like HMaster and namenode) being informed by the events which happens during their
absent somehow?

Although we have only 5 nodes in this cluster, we do perform HA on every levels of HBase service
stack. So yes, there are multiple instances of every services as long as it's possible or
necessay (e.g. we have 3 HMaster, 2 name node, 3 journal node)



From: Ted Yu<mailto:yuzhihong@gmail.com>
Date: 2016-04-01 00:00
To: user@hbase.apache.org<mailto:user@hbase.apache.org>
Subject: Re: Could not initialize all stores for the region
bq. File does not exist: /hbase/data/default/vocabulary/

Can you search in namenode audit log to see which node initiated the delete
request of the above file ?
Then you can search in that node's region server log to get more clue.

bq. hosts the HDFS namenode and datanode, Cloudera Manager, as well as
HBase master and region server

Can you separate some daemons off this node (e.g. HBase master) ?
I assume you have second HBase master running somewhere else. Otherwise
this node becomes the weak point of the cluster.

On Thu, Mar 31, 2016 at 7:58 AM, Zheng Shen <zhengshencn@outlook.com> wrote:

> Hi,
> Our Hbase cannot performance any write operation while the read operation
> are fine. I found the following error from regision server log
> Could not initialize all stores for the
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.
> Failed open of
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.,
> starting to roll back the global memstore size.
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
> File does not exist:
> /hbase/data/default/vocabulary/2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559)
>         at
> Opening of region {ENCODED => 19faeb6e4da0b1873f68da271b0f5788, NAME =>
> 'vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.',
> STARTKEY => '576206_6513944', ENDKEY => '599122_6739914'} failed,
> transitioning from OPENING to FAILED_OPEN in ZK, expecting version 22
> We are using Cloudera CDH 5.4.7, the HBase version is 1.0.0-cdh_5.4.7,
> with HDFS HA enabled (one of the namenode is running on the server being
> shutdown). Our HBase cluster expereienced an expected node shutdown today
> for about 4 hours. The node which is shutdown hosts the HDFS namenode and
> datanode, Cloudera Manager, as well as HBase master and region server (5
> nodes in totally in our small clusder).  During the node shuting down,
> beside the services running that that node, the other HDFS namenode,
> failover server, and 2 of 3 journal node are also down. After the node is
> recovered, we restarted the whole CDH cluster, and then it ends like this
> one...
> The HDFS checking "hdfs fsck" does not report any corrupted blocks.
> Any suggesion about where we should look into for this problem?
> Thanks!
> Zheng
> ________________________________
> zhengshencn@outlook.com
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message