Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: 209.85.216.41 is neither permitted nor
 denied by domain of discord@uw.edu)
MIME-Version: 1.0
In-Reply-To: 
 <CAChq9g3=hZhoh7EZmmQDUBVHSPwdYA56aOx8wbF0iSgsTB8WZw@mail.gmail.com>
References: 
 <CAB-gU_tyetj-KKUjsQ8ZpTfEb1U4Nu1aSf0SU-tW9owX2mju8Q@mail.gmail.com>
	<CAChq9g3=hZhoh7EZmmQDUBVHSPwdYA56aOx8wbF0iSgsTB8WZw@mail.gmail.com>
Date: Thu, 16 Oct 2014 20:01:54 -0700
Message-ID: 
 <CAB-gU_u1j2umO886UFdihoqAWDBnGNtPjWehgxC9whPTFx9kjQ@mail.gmail.com>
Subject: Re: decommissioning disks on a data node
From: Colin Kincaid Williams <discord@uw.edu>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a113a989661c25a0505959691

--001a113a989661c25a0505959691
Content-Type: text/plain; charset=UTF-8

For some reason he seems intent on resetting the bad Virtual blocks, and
giving the drives another shot. From what he told me, nothing is under
warranty anymore. My first suggestion was to get rid of the disks.

Here's the command:

/opt/dell/srvadmin/bin/omconfig storage vdisk action=clearvdbadblocks
controller=1 vdisk=$vid

I'm still curious about how hadoop blocks work. I'm assuming that each
block is stored on one of the many mountpoints, and not divided between
them. I know there is a tolerated volume failure option in hdfs-site.xml.

Then if the operations I laid out are legitimate, specifically removing the
drive in question and restarting the data node. The advantage being less
re-replication and less downtime.

On Thu, Oct 16, 2014 at 6:58 PM, Travis <hcoyote@ghostar.org> wrote:

>
>
> On Thu, Oct 16, 2014 at 7:03 PM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
>> We have been seeing some of the disks on our cluster having bad blocks,
>> and then failing. We are using some dell PERC H700 disk controllers that
>> create "virtual devices".
>>
>>
> Are you doing a bunch of single-disk RAID0 devices with the PERC to mimic
> JBOD?
>
>
>> Our hosting manager uses a dell utility which reports "virtual device bad
>> blocks". He has suggested that we use the dell tool to remove the "virtual
>> device bad blocks", and then re-format the device.
>>
>
> Which Dell tool is he using for this?  the OMSA tools?  In practice, if
> OMSA is telling you the drive is bad, it's likely already exhausted all the
> available reserved blocks that it could use to remap bad blocks and
> probably not worth messing with the drive.  Just get Dell to replace it
> (assuming your hardware is under warranty or support).
>
>
>>
>>  I'm wondering if we can remove the disks in question from the
>> hdfs-site.xml, and restart the datanode , so that we don't re-replicate the
>> hadoop blocks on the other disks. Then we would go ahead and work on the
>> troubled disk, while the datanode remained up. Finally we would restart the
>> datanode again after re-adding the freshly formatted { possibly new } disk.
>> This way the data on the remaining disks doesn't get re-replicated.
>>
>> I don't know too much about the hadoop block system. Will this work ? Is
>> it an acceptable strategy for disk maintenance ?
>>
>
> The data may still re-replicate from the missing disk within your cluster
> if the namenode determines that those blocks are under-replicated.
>
> Unless your cluster is so tight on space that you couldn't handle taking
> one disk out for maintenance, the re-replication of blocks from the missing
> disk within the cluster should be fine.   You don't need to keep the entire
> datanode down through out the entire time you're running tests on the
> drive.  The process you laid out is basically how we manage disk
> maintenance on our Dells:  stopping the datanode, unmounting the broken
> drive, modifying the hdfs-site.xml for that node, and restarting it.
>
> I've automated some of this process with puppet by taking advantage of
> ext3/ext4's ability to set a label on the partition that puppet looks for
> when configuring mapred-site.xml and hdfs-site.xml.  I talk about it in a
> few blog posts from a few years back if you're interested.
>
>   http://www.ghostar.org/2011/03/hadoop-facter-and-the-puppet-marionette/
>
> http://www.ghostar.org/2013/05/using-cobbler-with-a-fast-file-system-creation-snippet-for-kickstart-post-install/
>
>
> Cheers,
> Travis
> --
> Travis Campbell
> travis@ghostar.org
>

--001a113a989661c25a0505959691
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">For some reason he seems intent on resetting the bad Virtu=
al blocks, and giving the drives another shot. From what he told me, nothin=
g is under warranty anymore. My first suggestion was to get rid of the disk=
s.<div>=C2=A0</div><div>Here&#39;s the command:</div><div><br></div><div>/o=
pt/dell/srvadmin/bin/omconfig storage vdisk action=3Dclearvdbadblocks contr=
oller=3D1 vdisk=3D$vid<br></div><div><br></div><div>I&#39;m still curious a=
bout how hadoop blocks work. I&#39;m assuming that each block is stored on =
one of the many mountpoints, and not divided between them. I know there is =
a tolerated volume failure option in hdfs-site.xml.=C2=A0</div><div><br></d=
iv><div>Then if the operations I laid out are legitimate, specifically remo=
ving the drive in question and restarting the data node. The advantage bein=
g less re-replication and less downtime.=C2=A0</div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">On Thu, Oct 16, 2014 at 6:58 PM, T=
ravis <span dir=3D"ltr">&lt;<a href=3D"mailto:hcoyote@ghostar.org" target=
=3D"_blank">hcoyote@ghostar.org</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div clas=
s=3D"gmail_quote"><span class=3D"">On Thu, Oct 16, 2014 at 7:03 PM, Colin K=
incaid Williams <span dir=3D"ltr">&lt;<a href=3D"mailto:discord@uw.edu" tar=
get=3D"_blank">discord@uw.edu</a>&gt;</span> wrote:<br><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;borde=
r-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><di=
v dir=3D"ltr">We have been seeing some of the disks on our cluster having b=
ad blocks, and then failing. We are using some dell PERC H700 disk controll=
ers that create &quot;virtual devices&quot;.=C2=A0<div><br></div></div></bl=
ockquote><div><br></div></span><div>Are you doing a bunch of single-disk RA=
ID0 devices with the PERC to mimic JBOD?</div><span class=3D""><div>=C2=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex"><div dir=3D"ltr"><div></div><div>Our hosting manager =
uses a dell utility which reports &quot;virtual device bad blocks&quot;. He=
 has suggested that we use the dell tool to remove the &quot;virtual device=
 bad blocks&quot;, and then re-format the device.=C2=A0</div></div></blockq=
uote><div><br></div></span><div>Which Dell tool is he using for this? =C2=
=A0the OMSA tools?=C2=A0 In practice, if OMSA is telling you the drive is b=
ad, it&#39;s likely already exhausted all the available reserved blocks tha=
t it could use to remap bad blocks and probably not worth messing with the =
drive.=C2=A0 Just get Dell to replace it (assuming your hardware is under w=
arranty or support).=C2=A0</div><span class=3D""><div>=C2=A0</div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-widt=
h:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-le=
ft:1ex"><div dir=3D"ltr"><div><br></div><div>=C2=A0I&#39;m wondering if we =
can remove the disks in question from the hdfs-site.xml, and restart the da=
tanode , so that we don&#39;t re-replicate the hadoop blocks on the other d=
isks. Then we would go ahead and work on the troubled disk, while the datan=
ode remained up. Finally we would restart the datanode again after re-addin=
g the freshly formatted { possibly new } disk. This way the data on the rem=
aining disks doesn&#39;t get re-replicated.=C2=A0</div><div><br></div><div>=
I don&#39;t know too much about the hadoop block system. Will this work ? I=
s it an acceptable strategy for disk maintenance ?</div></div>
</blockquote></span></div><br>The data may still re-replicate from the miss=
ing disk within your cluster if the namenode determines that those blocks a=
re under-replicated. =C2=A0</div><div class=3D"gmail_extra"><br></div><div =
class=3D"gmail_extra">Unless your cluster is so tight on space that you cou=
ldn&#39;t handle taking one disk out for maintenance, the re-replication of=
 blocks from the missing disk within the cluster should be fine. =C2=A0 You=
 don&#39;t need to keep the entire datanode down through out the entire tim=
e you&#39;re running tests on the drive.=C2=A0 The process you laid out is =
basically how we manage disk maintenance on our Dells: =C2=A0stopping the d=
atanode, unmounting the broken drive, modifying the hdfs-site.xml for that =
node, and restarting it. =C2=A0</div><div class=3D"gmail_extra"><br></div><=
div class=3D"gmail_extra">I&#39;ve automated some of this process with pupp=
et by taking advantage of ext3/ext4&#39;s ability to set a label on the par=
tition that puppet looks for when configuring mapred-site.xml and hdfs-site=
.xml.=C2=A0 I talk about it in a few blog posts from a few years back if yo=
u&#39;re interested.</div><div class=3D"gmail_extra"><br></div><div class=
=3D"gmail_extra">=C2=A0=C2=A0<a href=3D"http://www.ghostar.org/2011/03/hado=
op-facter-and-the-puppet-marionette/" target=3D"_blank">http://www.ghostar.=
org/2011/03/hadoop-facter-and-the-puppet-marionette/</a></div><div class=3D=
"gmail_extra">=C2=A0=C2=A0<a href=3D"http://www.ghostar.org/2013/05/using-c=
obbler-with-a-fast-file-system-creation-snippet-for-kickstart-post-install/=
" target=3D"_blank">http://www.ghostar.org/2013/05/using-cobbler-with-a-fas=
t-file-system-creation-snippet-for-kickstart-post-install/</a></div><div cl=
ass=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><br></div><div cla=
ss=3D"gmail_extra">Cheers,</div><div class=3D"gmail_extra">Travis</div><spa=
n class=3D"HOEnZb"><font color=3D"#888888"><div class=3D"gmail_extra">-- <b=
r>Travis Campbell<br><a href=3D"mailto:travis@ghostar.org" target=3D"_blank=
">travis@ghostar.org</a>
</div></font></span></div>
</blockquote></div><br></div>

--001a113a989661c25a0505959691--