hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Region is not online: -ROOT-,,0
Date Wed, 26 Jan 2011 00:02:45 GMT
I'm still not sure how I got into this situation, but I've gotten
myself out of it and I'm up and running.

The fix was to shut down the cluster and remove the .log/ files from
HDFS. Then the master was able to start properly and a regionserver
was able to start up and serve the -ROOT- region.

One theory as to the cause of this issue (twice now), is that I was
still getting bit by the issue of invalid hadoop maven jars in my
classpath (see https://issues.apache.org/jira/browse/HBASE-3436) on 2
of my 4 regionservers. I'll add more commentary around HBASE-3436 in
the JIRA.

On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham <billgraham@gmail.com> wrote:
> Hi,
> A developer on our team created a table today and something failed and
> we fell back into the dire scenario we were in earlier this week. When
> I got on the scene 2 of our 4 regions had crashed. When I brought them
> back up, they wouldn't come online and the master was scrolling
> messages like those in
> https://issues.apache.org/jira/browse/HBASE-3406.
> I'm running 0.90.0-rc1 and CDH3b2 with append enabled.
> I shut down the entire cluster + zookeeper and restarted it. Now, I'm
> getting two types of errors and the cluster won't come up:
> - On one of the regionservers:
> 2011-01-25 15:12:00,287 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online: -ROOT-,,0
> - And on the master this scrolls every few seconds. the log file
> referenced is empty in HDFS.
> 2011-01-25 15:12:26,897 WARN org.apache.hadoop.hbase.util.FSUtils:
> Waited 275444ms for lease recovery on
> hdfs://mymaster.com:9000/hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
> failed to create file
> /hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592
> for DFSClient_hb_m_mymaster.com:60000_1295996847777 on client
>, because this file is already being created by NN_Recovery
> on
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093)
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
>        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
> Any suggestions for how to get the -ROOT- back? I can see it in HDFS.
> thanks,
> Bill

View raw message