hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Region is not online: -ROOT-,,0
Date Wed, 26 Jan 2011 03:06:40 GMT
These jiras might be related:

https://issues.apache.org/jira/browse/HDFS-1520

https://issues.apache.org/jira/browse/HDFS-1554

I'm not sure they would help in this situation, since the client
'NN_Recovery' isn't a "real" client (ie: a hbase regionserver).



On Tue, Jan 25, 2011 at 6:59 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> It's all about this line:
>
> "for DFSClient_hb_m_mymaster.com:60000_1295996847777 on client
> 10.14.98.90, because this file is already being created by NN_Recovery"
>
> I'm not really sure why that happens, I've seen that on my test
> clusters, and basically this holds up region redeployment hence your
> problems.
>
> Perhaps someone familiar with the deep internals of append recovery
> can speak up...
>
> -ryan
>
>
> On Tue, Jan 25, 2011 at 4:02 PM, Bill Graham <billgraham@gmail.com> wrote:
>> I'm still not sure how I got into this situation, but I've gotten
>> myself out of it and I'm up and running.
>>
>> The fix was to shut down the cluster and remove the .log/ files from
>> HDFS. Then the master was able to start properly and a regionserver
>> was able to start up and serve the -ROOT- region.
>>
>> One theory as to the cause of this issue (twice now), is that I was
>> still getting bit by the issue of invalid hadoop maven jars in my
>> classpath (see https://issues.apache.org/jira/browse/HBASE-3436) on 2
>> of my 4 regionservers. I'll add more commentary around HBASE-3436 in
>> the JIRA.
>>
>>
>>
>> On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham <billgraham@gmail.com> wrote:
>>> Hi,
>>>
>>> A developer on our team created a table today and something failed and
>>> we fell back into the dire scenario we were in earlier this week. When
>>> I got on the scene 2 of our 4 regions had crashed. When I brought them
>>> back up, they wouldn't come online and the master was scrolling
>>> messages like those in
>>> https://issues.apache.org/jira/browse/HBASE-3406.
>>>
>>> I'm running 0.90.0-rc1 and CDH3b2 with append enabled.
>>>
>>> I shut down the entire cluster + zookeeper and restarted it. Now, I'm
>>> getting two types of errors and the cluster won't come up:
>>>
>>> - On one of the regionservers:
>>> 2011-01-25 15:12:00,287 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> NotServingRegionException; Region is not online: -ROOT-,,0
>>>
>>> - And on the master this scrolls every few seconds. the log file
>>> referenced is empty in HDFS.
>>> 2011-01-25 15:12:26,897 WARN org.apache.hadoop.hbase.util.FSUtils:
>>> Waited 275444ms for lease recovery on
>>> hdfs://mymaster.com:9000/hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
>>> failed to create file
>>> /hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592
>>> for DFSClient_hb_m_mymaster.com:60000_1295996847777 on client
>>> 10.14.98.90, because this file is already being created by NN_Recovery
>>> on 10.10.220.15
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
>>>        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>>>
>>> Any suggestions for how to get the -ROOT- back? I can see it in HDFS.
>>>
>>> thanks,
>>> Bill
>>>
>>
>

Mime
View raw message