hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Re: decommissioning disks on a data node
Date Fri, 17 Oct 2014 04:41:24 GMT
 Hi Travis,

Thanks for your input. I forgot to mention that the drives are most likely
in the single drive configuration that you describe.

I think what I've found is that restarting the datanodes in the manner I
describe shows that the mount points on the drives with the reset blocks
and newly formatted partition have gone bad. Then I'm not sure the namenode
will use these locations, even if it does not show the volumes failed.
Without a way to reinitialize the disks, specifically the mount points, I
assume my efforts are in vain.

Therefore the only procedure that makes sense is to decommission the nodes
with which I want to bring the failed volumes back up. It just didn't make
sense to me that if we have a large number of disks with good data, that we
would end up wiping that data and starting over again.

On Oct 16, 2014 8:36 PM, "Travis" <hcoyote@ghostar.org> wrote:

> On Thu, Oct 16, 2014 at 10:01 PM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>> For some reason he seems intent on resetting the bad Virtual blocks, and
>> giving the drives another shot. From what he told me, nothing is under
>> warranty anymore. My first suggestion was to get rid of the disks.
>> Here's the command:
>> /opt/dell/srvadmin/bin/omconfig storage vdisk action=clearvdbadblocks
>> controller=1 vdisk=$vid
> Well, the usefulness of this action is going to entirely depend on how
> you've actually set up the virtual disks.
> If you've set it up so there's only one physical disk in each vdisk
> (single-disk RAID0), then the bad "virtual" block is likely going to map to
> a real bad block.
> If you're doing something where there are multiple disks associated with
> each virtual disk (eg, RAID1, RAID10 ... can't remember if RAID5/RAID6 can
> exhibit what follows), it's possible for the virtual device to have a bad
> block that is actually mapped to a good physical block underneath.  This
> can happen, for example, if you had a failing drive in the vdisk and
> replaced it, but the controller had remapped the bad virtual block to some
> place good.  Replacing the drive with a good one makes the controller think
> the bad block is still there.  Dell calls it a punctured stripe (for better
> description see
> http://lists.us.dell.com/pipermail/linux-poweredge/2010-December/043832.html).
> In this case, the fix is clearing the virtual badblock list with the above
> command.
>> I'm still curious about how hadoop blocks work. I'm assuming that each
>> block is stored on one of the many mountpoints, and not divided between
>> them. I know there is a tolerated volume failure option in hdfs-site.xml.
> Correct.  Each HDFS block is actually treated as a file that lives on a
> regular filesystem, like ext3 or ext4.   If you did an ls inside one of
> your vdisk's, you'd see the raw blocks that the datanode is actually
> storing.  You just wouldn't be able to easily tell what file that block was
> a part of because it's named with a block id, not the actual file name.
>> Then if the operations I laid out are legitimate, specifically removing
>> the drive in question and restarting the data node. The advantage being
>> less re-replication and less downtime.
> Yup.  It will minimize the actual prolonged outage of the datanode
> itself.  You'll get a little re-replication while the datanode process is
> off, but if you keep that time reasonably short, you should be fine.  When
> the datanode process comes back up, it will walk all of it's configured
> filesystems determining which blocks it still has on disk and report that
> back to the namenode.  Once that happens, re-replication will stop because
> the namenode knows where those missing blocks are and no longer treat them
> as under-replicated.
> Note:  You'll still get some re-replication occurring for the blocks that
> lived on the drive you removed.  But it's only a drive's worth of blocks,
> not a whole datanode.
> Travis
> --
> Travis Campbell
> travis@ghostar.org

View raw message