hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2575) Fault scenario of dead root drive on RS causes cluster lockup
Date Thu, 20 May 2010 00:12:53 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869419#action_12869419

Todd Lipcon commented on HBASE-2575:

My thought to reproduce is something like this:
# dd if=/dev/zero of=myimage bs=1M count=1000
# losetup -f myimage
# mdadm --create /dev/md0 --level=faulty --raid-devices=1  /dev/loop1
# mkfs.ext3 /dev/md0
# mkdir /myhbase-disk
# mount /dev/md0 /myhbase-disk
# cp -a $HBASE_HOME /myhbase-disk
# start regionserver over there
# mdadm --grow /dev/md0 -l faulty -p read-persistent

> Fault scenario of dead root drive on RS causes cluster lockup
> -------------------------------------------------------------
>                 Key: HBASE-2575
>                 URL: https://issues.apache.org/jira/browse/HBASE-2575
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Priority: Critical
> We performed a fault test where we physically pulled the root drive out of a machine
while it was on. The regionserver continued to run fine with existing clients. But any new
clients that tried to connect to it for RPC would not work correctly. So when I started a
new client, that client made no progress. Despite this, the RS continued to happily heartbeat
to the master, so the master did not remove it from the cluster. Note that in this case, we
were logging to NFS, and the logs continued to write, but no exceptions shown.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message