hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhou Shuaifeng <zhoushuaif...@huawei.com>
Subject Re: all regionserver shutdown after close hdfs datanode
Date Wed, 22 Dec 2010 08:20:23 GMT
Hi,

There are many problem blocks, but the log I attached in my mail below have
only one. Many others have 3 replicas:
2010-12-20 09:10:31,167 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_1292656843783_2494443 in pipeline 167.6.5.17:50010,
167.6.5.16:50010, 167.6.5.11:50010: bad datanode 167.6.5.17:50010
2010-12-20 09:10:31,206 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.io.IOException: Connection reset by peer

The hbase version I use is 0.20.6, not 0.89.

Zhou 

-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2010年12月22日 3:12
收件人: user@hbase.apache.org
主题: Re: all regionserver shutdown after close hdfs datanode

2010/12/20 Zhou Shuaifeng <zhoushuaifeng@huawei.com>:
> Hi,
> I checked the log, It's not the master caused the regionserver shutdown,
but
> the regionserver log rolling failed caused regionserver shutdown.
>

The problem block only had one replica?  If you look in the hdfs
emissions, it'll usually log other nodes that have the wanted block.

I don't believe you say which hbase/hdfs you are using?  In 0.89.x
hbases, at least for WAL log, we'll go out of our way to guarantee
sufficient replicas.

St.Ack


> According the log, error occurred in the pipeline, but why hdfs are not
able
> to select another good data node when one datanode in the pipeline is not
> available?
>
>
> The log:
> 2010-12-20 09:15:41,769 FATAL
> org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with
ioe:
>
> java.io.IOException: Error Recovery for block blk_1292656843439_2494096
> failed  because recovery from primary datanode 167.6.5.17:50010 failed 6
> times.  Pipeline was 167.6.5.17:50010. Aborting...
>        at
>
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli
> ent.java:3249)
>        at
>
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:
> 2654)
>        at
>
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.
> java:2837)
>
> the corresponding code in regionserver:
>        LOG.fatal("Log rolling failed with ioe: ",
>          RemoteExceptionHandler.checkIOException(ex));
>        server.checkFileSystem();
>        // Abort if we get here.  We probably won't recover an IOE.
> HBASE-1132
>        server.abort();
>
> the abort() code:
>  public void abort() {
>    this.abortRequested = true;
>    this.reservedSpace.clear();
>    LOG.info("Dump of metrics: " + this.metrics.toString());
>    stop();
>  }
>
> The corresponding log:
> 2010-12-20 09:15:41,777 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=9.666667, regions=1512, stores=1512, storefiles=5833,
> storefileIndexSize=1833, memstoreSize=2941, compactionQueueSize=1228,
> usedHeap=6849, maxHeap=8165, blockCacheSize=14047672,
> blockCacheFree=1698276936, blockCacheCount=0, blockCacheHitRatio=0,
> fsReadLatency=0, fsWriteLatency=59, fsSyncLatency=0
>
>
>
>
> Zhou Shuaifeng(Frank)
> HUAWEI TECHNOLOGIES CO.,LTD.  huawei_logo
>
>
> -----邮件原件-----
> 发件人: Daniel Iancu [mailto:daniel.iancu@1and1.ro]
> 发送时间: 2010年12月20日 23:46
> 收件人: user@hbase.apache.org
> 主题: Re: all regionserver shutdown after close hdfs datanode
>
> Hi Zhou
> You should check if the HMaster is still up. If not, check its logs, if
> for some reason HMaster thinks HDFS is not available it will
> shutdown the HBase cluster.
> Regards
> Daniel
>
> On 12/20/2010 06:15 AM, Zhou Shuaifeng wrote:
>> Hi,
>>
>>
>>
>> I have a cluster of 8  hdfs datanodes and 8 hbase regionservers. When I
>> shutdown one node(a pc with one datanode and one regionserver running),
> all
>> hbase regionservers shutdown after a while.
>>
>> Other 7 hdfs datanodes is OK.
>>
>>
>>
>> I think it's not reasionable. Hbase is a distribute system that should
>> tolerance some nodes abnormal. So, what's the matter? Is there any
> configure
>> that can solve this problem or is a bug?
>>
>>
>>
>> Thanks and best Regards.
>>
>>
>>
>> Zhou
>>
>>
>
----------------------------------------------------------------------------
>> ---------------------------------------------------------
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which
>> is intended only for the person or entity whose address is listed above.
> Any
>> use of the
>> information contained herein in any way (including, but not limited to,
>> total or partial
>> disclosure, reproduction, or dissemination) by persons other than the
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, please
>> notify the sender by
>> phone or email immediately and delete it!
>>
>
> --
> Daniel Iancu
> Java Developer,Web Components Romania
> 1&1 Internet Development srl.
> 18 Mircea Eliade St
> Sect 1, Bucharest
> RO Bucharest, 012015
> www.1and1.ro
> Phone:+40-031-223-9081
> Email:daniel.iancu@1and1.ro
> IM:diancu@united.domain
>
>
>
>


Mime
View raw message