hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Richard" <chris.rich...@gmail.com>
Subject Re: Root table couldn't be opened
Date Tue, 16 Aug 2011 05:58:50 GMT

-----Original Message-----
From: Gaojinchao <gaojinchao@huawei.com>
Date: Tue, 16 Aug 2011 04:23:58 
To: user@hbase.apache.org<user@hbase.apache.org>
Reply-To: user@hbase.apache.org
Subject: re: Root table couldn't be opened

Why did the master replay its logs if it did not exit?
Zk is expired because of gc. But region server isn't shutdown.

(I like how you noticed the log message that says 82 has root and meta)

Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown handler to be executed,
root=true, meta=true
It said that 82 has root and meta. "root=true" shows the dead region server has root table.


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年8月16日 12:12
收件人: user@hbase.apache.org
主题: Re: Root table couldn't be opened

On Wed, Aug 10, 2011 at 7:05 PM, Gaojinchao <gaojinchao@huawei.com> wrote:
> In my cluster(version 0.90.3) , The root table couldn't be opened when one region server
crashed because of gc.
>
> The logs show:
>
> // Master assigned the root table to 82
> 2011-07-28 21:34:34,710 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386
>
> //The host of 82 crashed, master finished the split log and reassigned the root and meta.
But the region server didn't exit. So the root verified is passed.
>  I think we shouldn't verify the root / meta in shutdownhandler processing
>


82 did not exit?

Why did the master replay its logs if it did not exit?


> 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=158-1-101-82,20020,1311885942386
to dead servers, submitted shutdown handler to be executed, root=true, meta=true


Isn't this the master handling 82 likes its been shutdown?


> 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT
rejected; currently processing 158-1-101-82,20020,1311885942386 as dead server

So, it looks like 82 tried to come in (after GC I suppose) but we told
it go away.

Why did we not notice that -ROOT- was on 82 and as part of the
shutdown handling of 82, we reassigned it.  This is what you are
saying in your subsequent message (I like how you noticed the log
message that says 82 has root and meta).  I'm not sure why it did not
reassign root.  Its skipping something in shutdown handler or the
verify location for root has a bug in it where we are not considering
the fact that current server held -ROOT- so if verification returns
current server as holding -ROOT-, then we should ignore it.

Good stuff Gao,
St.Ack
Mime
View raw message