hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Lost .META., lost tables
Date Sat, 22 Jan 2011 18:27:45 GMT
Hi,

Last night while experimenting with getting lzo set up I managed to
somehow lose all .META. data and all my tables. My regions still exist
in HDFS, but the shell tells me I have no tables. At this point I'm
pretty sure I need to reinstall HBase clean-slate on HDFS, hence
losing all data, but I'm sharing my story in case there are JIRAs to
be created or lessons to be learned.

Specifics:
- 4 Node cluster running 0.90.0.rc1
- 1 table of a few GBs and 24 regions, let's call it TableA
- CDH3b2

1. Just for kicks I decided to issue an alter table command to change
COMPRESSION to 'lzo' for TableA to see what would happen. I hadn't yet
taken any steps to install the native lzo libs in HBase (they exist in
HDFS), so this was probably a stupid thing to do. After issuing the
command I wasn't able to re-enable the table, nor could I fully
disable it. I was in a state somewhere in between the two, as
described in a thread earlier this week. The shell said enabled, the
master.jsp said disabled. Calls to do either would time out. The
master server was logging the same exceptions as in HBASE-3406 ad
infinitum. hbck -fix wasn't doing anything. After bouncing the entire
cluster a few times (master, RSs, zookeepers), I was able to finally
get back to normal state, with COMPRESSION set to 'none'  with hbck
-fix.

Besides HBASE-3406, maybe there's another JIRA here where the shell
permits setting COMPRESSION => 'lzo' when lzo isn't set up and leaves
the table in a nasty state.

At this point I should have been grateful and called in a night, but
noooooo... Instead I shut down the cluster again and symlinked
lib/native to the same dir in my hadoop home, which is lzo-enabled and
I restarted the cluster. All seemed ok.

2. At this point I decided to experiment with a new table after
reading http://wiki.apache.org/hadoop/UsingLzoCompression more
closely. After creating 'mytable' with lzo enabled, I saw similar
behavior as I did in 1. so I used the same techniques to just try to
just drop the table. After bouncing the cluster and issuing a hbck
-fix, the shell reported that HBase had no tables at all. It seemed
like all the .META. data was wiped out but I still had all of my
orphaned regions in HDFS. This was very bad.

It was clear that these tables weren't coming back so in a last ditch
effort I stopped the HBase cluster, the SNN and the NN and I restored
HDFS from the checkpoint taken about an hour before. Now everything
was out of whack and HBase wouldn't even come up and -ROOT- couldn't
be located, .log/ files weren't being read properly and things were a
mess.

One could make the argument that I was beating on HBase a bit and
maybe even trying to break things, but it didn't take a lot of effort
to get to a pretty dire state.

thanks,
BIll

Mime
View raw message