hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew LeMieux <...@mlogiciels.com>
Subject Re: HBase crash, need help getting back up
Date Thu, 09 Sep 2010 01:21:36 GMT
I tried moving that file to tmp.  It appears as though the master is no longer stuck, but clients
are still not able to run queries.  

There aren't any messages passing by in the log files (just routine messages I see when the
server isn't doing anything), but attempts to run queries resulted in not server region exceptions
(i.e., count 'table'). 

I tried enable 'table', and found that after this command there was a huge amount of activity
in the log files, and I was able to run queries again.  

There was no previous call to disable 'table', but for some reason HBase wasn't bringing tables/regions
online.  

I'm not sure what caused the problem or even if the actions I took will fix it again in the
future, but I am back up and running for now.  

FYI,

-Matthew

On Sep 8, 2010, at 6:00 PM, Matthew LeMieux wrote:

> My HBase cluster just crashed.   One of the Region servers stopped (do not yet know why).
 After restarting it, the cluster seemed a but wobbly, so I decided to shutdown everything,
and restart fresh.  I did so (including zookeeper and HDFS). 
> 
> Upon restart, I'm getting the following message in the Master's log file repeating continuously
with the number of ms waited counting up.  
> 
> 2010-09-09 00:54:58,406 WARN org.apache.hadoop.hbase.util.FSUtils: Waited 69188ms for
lease recovery on hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/.logs/domU-12-31-39-0C-38-31.compute-1.internal,60020,1283905848540/10.215.59.191%3A60020.1283905909298:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /hbase/.logs/domU-12-31-39-0C-38-31.compute-1.internal,60020,1283905848540/10.215.59.191%3A60020.1283905909298
for DFSClient_hb_m_10.104.37.247:60000 on client 10.104.37.247 because current leaseholder
is trying to recreate file.
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1068)
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
>        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
> 
> 
> The region servers are waiting with this being the final message in their log file: 
> 
> 2010-09-09 00:53:49,111 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling
master at 10.104.37.247:60000 that we are up
> 
> I've  been using this version for a little under a week without incident (http://people.apache.org/~jdcryans/hbase-0.89.20100830-candidate-1/
).  
> 
> The HDFS comes from CDH3.  
> 
> Does anybody have any ideas on what I can do to get back up and running?
> 
> Thank you, 
> 
> Matthew
> 


Mime
View raw message