hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhou Shuaifeng <zhoushuaif...@huawei.com>
Subject Re: all regionserver shutdown after close hdfs datanode
Date Tue, 21 Dec 2010 03:08:05 GMT
Hi,
I checked the log, It's not the master caused the regionserver shutdown, but
the regionserver log rolling failed caused regionserver shutdown.

According the log, error occurred in the pipeline, but why hdfs are not able
to select another good data node when one datanode in the pipeline is not
available?


The log:
2010-12-20 09:15:41,769 FATAL
org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with ioe:

java.io.IOException: Error Recovery for block blk_1292656843439_2494096
failed  because recovery from primary datanode 167.6.5.17:50010 failed 6
times.  Pipeline was 167.6.5.17:50010. Aborting...
	at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli
ent.java:3249)
	at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:
2654)
	at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.
java:2837)

the corresponding code in regionserver:
        LOG.fatal("Log rolling failed with ioe: ",
          RemoteExceptionHandler.checkIOException(ex));
        server.checkFileSystem();
        // Abort if we get here.  We probably won't recover an IOE.
HBASE-1132
        server.abort();

the abort() code:
  public void abort() {
    this.abortRequested = true;
    this.reservedSpace.clear();
    LOG.info("Dump of metrics: " + this.metrics.toString());
    stop();
  }

The corresponding log:
2010-12-20 09:15:41,777 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=9.666667, regions=1512, stores=1512, storefiles=5833,
storefileIndexSize=1833, memstoreSize=2941, compactionQueueSize=1228,
usedHeap=6849, maxHeap=8165, blockCacheSize=14047672,
blockCacheFree=1698276936, blockCacheCount=0, blockCacheHitRatio=0,
fsReadLatency=0, fsWriteLatency=59, fsSyncLatency=0




Zhou Shuaifeng(Frank)
HUAWEI TECHNOLOGIES CO.,LTD.  huawei_logo


-----邮件原件-----
发件人: Daniel Iancu [mailto:daniel.iancu@1and1.ro] 
发送时间: 2010年12月20日 23:46
收件人: user@hbase.apache.org
主题: Re: all regionserver shutdown after close hdfs datanode

Hi Zhou
You should check if the HMaster is still up. If not, check its logs, if 
for some reason HMaster thinks HDFS is not available it will
shutdown the HBase cluster.
Regards
Daniel

On 12/20/2010 06:15 AM, Zhou Shuaifeng wrote:
> Hi,
>
>
>
> I have a cluster of 8  hdfs datanodes and 8 hbase regionservers. When I
> shutdown one node(a pc with one datanode and one regionserver running),
all
> hbase regionservers shutdown after a while.
>
> Other 7 hdfs datanodes is OK.
>
>
>
> I think it's not reasionable. Hbase is a distribute system that should
> tolerance some nodes abnormal. So, what's the matter? Is there any
configure
> that can solve this problem or is a bug?
>
>
>
> Thanks and best Regards.
>
>
>
> Zhou
>
>
----------------------------------------------------------------------------
> ---------------------------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>

-- 
Daniel Iancu
Java Developer,Web Components Romania
1&1 Internet Development srl.
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
www.1and1.ro
Phone:+40-031-223-9081
Email:daniel.iancu@1and1.ro
IM:diancu@united.domain




Mime
View raw message