hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase crash, need help getting back up
Date Thu, 09 Sep 2010 05:00:28 GMT
recovered.edits is the name of the file produced when wal logs are
split; one is made per region

Where you seeing that message?  Does it not have the full path the
recovered.edits file?

You are running w/ perms enabled on this cluster?

Why did the regionservers go down?


On Wed, Sep 8, 2010 at 9:54 PM, Matthew LeMieux <mdl@mlogiciels.com> wrote:
> Well, it was short lived, it only stayed up for a couple hours, all region servers crashed
this time, not just one.
> Now, after restarting, I've got the master server complaining about not having executable
permissions on "recovered.edits".  Where is this file?
>  Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException:
Permission denied: user=mlcamus, access=EXECUTE, inode="recovered.edits":mlcamus:supergroup:rw-r--r--
> The message has repeated for a half hour, with this showing up in one region server:
> 2010-09-09 04:52:34,887 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
> I assume this will get better if I change permissions of some file... which one?
> -Matthew
> On Sep 8, 2010, at 6:21 PM, Matthew LeMieux wrote:
>> I tried moving that file to tmp.  It appears as though the master is no longer stuck,
but clients are still not able to run queries.
>> There aren't any messages passing by in the log files (just routine messages I see
when the server isn't doing anything), but attempts to run queries resulted in not server
region exceptions (i.e., count 'table').
>> I tried enable 'table', and found that after this command there was a huge amount
of activity in the log files, and I was able to run queries again.
>> There was no previous call to disable 'table', but for some reason HBase wasn't bringing
tables/regions online.
>> I'm not sure what caused the problem or even if the actions I took will fix it again
in the future, but I am back up and running for now.
>> FYI,
>> -Matthew
>> On Sep 8, 2010, at 6:00 PM, Matthew LeMieux wrote:
>>> My HBase cluster just crashed.   One of the Region servers stopped (do not yet
know why).  After restarting it, the cluster seemed a but wobbly, so I decided to shutdown
everything, and restart fresh.  I did so (including zookeeper and HDFS).
>>> Upon restart, I'm getting the following message in the Master's log file repeating
continuously with the number of ms waited counting up.
>>> 2010-09-09 00:54:58,406 WARN org.apache.hadoop.hbase.util.FSUtils: Waited 69188ms
for lease recovery on hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/.logs/domU-12-31-39-0C-38-31.compute-1.internal,60020,1283905848540/
failed to create file /hbase/.logs/domU-12-31-39-0C-38-31.compute-1.internal,60020,1283905848540/
for DFSClient_hb_m_10.104.37.247:60000 on client because current leaseholder
is trying to recreate file.
>>>       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1068)
>>>       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
>>>       at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
>>>       at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>>>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>>>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>>> The region servers are waiting with this being the final message in their log
>>> 2010-09-09 00:53:49,111 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Telling master at that we are up
>>> I've  been using this version for a little under a week without incident (http://people.apache.org/~jdcryans/hbase-0.89.20100830-candidate-1/
>>> The HDFS comes from CDH3.
>>> Does anybody have any ideas on what I can do to get back up and running?
>>> Thank you,
>>> Matthew

View raw message