hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: hbck -fix
Date Fri, 01 Jul 2011 19:57:26 GMT
Can someone please try to help answer one specific question? I had to
manually delete the actual HDFS data for the corrupted tables as disable
would not work. I am now seeing messages on the master from the
CatalogJanitor saying "REGIONINFO_QUALIFIER is empty" for regions in tables
that are no longer there. How do I fix this? I did a scan of the .META. and
see no trace of this old table or its regions. What is left to do to
manually remove all traces of this blasted table? Do I truly need to start
from scratch for the entire cluster?

Thanks in advance for any assistance anyone can provide.

On Fri, Jul 1, 2011 at 11:32 AM, Wayne <wav100@gmail.com> wrote:

> We had some serious issues from the hmaster running out of space on the
> root partition. We were getting "region server not found" errors on the
> client which then turned to client errors "servers have issues" etc.
> I ran the hbck command and found 14 inconsistencies. There were files in
> hdfs not used for region, regions with the same start key, a hole in the
> region chain, and a missing start region with an empty key. I tried to
> follow post examples to moves files and edit the .META. but gave up as it
> was over my head. I am now trying to truncate the affected tables but do not
> seem to be able to even do that as the disable does not seem to work because
> of these issues. I assume now we will have to blow away the entire cluster
> and start from scratch.
> We are not in production so we have the luxury to start again, but the
> damage to our confidence is severe. Is there work going on to improve hbck
> -fix to actually be able to resolve these types of issues? Do we need to
> expect to run a production hbase cluster to be able to move around and
> rebuild the region definitions and the .META. table by hand? Things just got
> a lot scarier fast for us, especially since we were hoping to go into
> production next month. Running out of disk space on the master's root
> partition can bring down the entire cluster? This is scary...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message