Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of davey.yan@gmail.com designates
 209.85.215.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr1HPb6si2aTEVPkMeGdEM6Bd21XzLD4TGY6yOVQRmkZ3Q@mail.gmail.com>
References: 
 <CAFAWafOi0pWOYq5cWKC9Wt-nt021MrsNjafcHBWhy6=mMP3w0g@mail.gmail.com>
	<CAOcnVr1HPb6si2aTEVPkMeGdEM6Bd21XzLD4TGY6yOVQRmkZ3Q@mail.gmail.com>
Date: Fri, 2 Aug 2013 11:56:57 +0800
Message-ID: 
 <CAFAWafPz4sdusiG3xqENFnKov+oaDawRSGgswr-wGv+GRRBTfA@mail.gmail.com>
Subject: Re: DataBlockScanner's rate limit
From: Davey Yan <davey.yan@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c3ca143d5e6c04e2eef3c0

--001a11c3ca143d5e6c04e2eef3c0
Content-Type: text/plain; charset=ISO-8859-1

Hi Harsh, thanks for reply.

Yes, dfs.replication was set to 1, but no missing mount.
Another questions: will a single replication factor offen lead to block
missing?
After the startup, the ratio reported in admin ui, e.g. 0.9826, will not
change?  Even the DataBlockScanner is still running?


On Fri, Aug 2, 2013 at 11:27 AM, Harsh J <harsh@cloudera.com> wrote:

> Hi,
>
> The DataBlockScanner isn't responsible for the DN block reports at
> startup, which is a wholly different thread/process - it is a NN
> independent operation that merely verifies blocks in the background
> for the DN's own health. Depending on what the outage caused, it is
> likely that you are missing a mount and perhaps blocks of files with a
> single replica. Run an fsck to identify what files these are and if
> they used a single replication factor?
>
> On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <davey.yan@gmail.com> wrote:
> > I recently got a mini cluster corrupted after my inappropriate process.
> >
> > This mini cluster's dfs.replication was set to 1.
> > After irregular restart of OS, I cannot wait to leave safemode, the block
> > ratio is 0.9862, < 0.999.
> > In the http://ip:50075/blockScannerReport, I notice there is rate limit
> to
> > 1MB.
> > It will verify the blocks for long time.
> >
> > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
> >
> > My question is: Why should we limit the rate in DataBlockScanner while
> the
> > cluster is still starting up or still in safemode?
> >
> > I read the source code of DataBlockScanner.java, there is no parameter to
> > change the rate limit.
> > It seams to be 1MB to 8MB always.
> >
> >
> > --
> > Davey Yan
>
>
>
> --
> Harsh J
>


-- 
Davey Yan

--001a11c3ca143d5e6c04e2eef3c0
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div style>Hi Harsh, thanks for reply.</div><div><br></div=
>Yes,=A0<span style=3D"color:rgb(80,0,80);font-family:arial,sans-serif;font=
-size:13px">dfs.replication was set to 1, but no missing mount.</span><div>=
<div style>
<span style=3D"color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13=
px">Another questions: will a single replication factor offen lead to block=
 missing?</span></div></div><div style><span style=3D"color:rgb(80,0,80);fo=
nt-family:arial,sans-serif;font-size:13px">A</span><span style=3D"color:rgb=
(80,0,80);font-family:arial,sans-serif;font-size:13px">fter the startup, th=
e ratio reported in admin ui, e.g. 0.9826, will not change? =A0Even the=A0<=
/span><span style=3D"font-family:arial,sans-serif;font-size:13px">DataBlock=
Scanner</span><span style=3D"font-family:arial,sans-serif;font-size:13px">=
=A0is still running?</span></div>
<div style><br></div><div style><span style=3D"color:rgb(80,0,80);font-fami=
ly:arial,sans-serif;font-size:13px"><br></span></div></div><div class=3D"gm=
ail_extra"><br><br><div class=3D"gmail_quote">On Fri, Aug 2, 2013 at 11:27 =
AM, Harsh J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" tar=
get=3D"_blank">harsh@cloudera.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<br>
<br>
The DataBlockScanner isn&#39;t responsible for the DN block reports at<br>
startup, which is a wholly different thread/process - it is a NN<br>
independent operation that merely verifies blocks in the background<br>
for the DN&#39;s own health. Depending on what the outage caused, it is<br>
likely that you are missing a mount and perhaps blocks of files with a<br>
single replica. Run an fsck to identify what files these are and if<br>
they used a single replication factor?<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan &lt;<a href=3D"mailto:davey.yan@g=
mail.com">davey.yan@gmail.com</a>&gt; wrote:<br>
&gt; I recently got a mini cluster corrupted after my inappropriate process=
.<br>
&gt;<br>
&gt; This mini cluster&#39;s dfs.replication was set to 1.<br>
&gt; After irregular restart of OS, I cannot wait to leave safemode, the bl=
ock<br>
&gt; ratio is 0.9862, &lt; 0.999.<br>
&gt; In the <a href=3D"http://ip:50075/blockScannerReport" target=3D"_blank=
">http://ip:50075/blockScannerReport</a>, I notice there is rate limit to<b=
r>
&gt; 1MB.<br>
&gt; It will verify the blocks for long time.<br>
&gt;<br>
&gt; So I &quot;hadoop dfdsadmin safemode leave&quot;, and then I got block=
s missing.<br>
&gt;<br>
&gt; My question is: Why should we limit the rate in DataBlockScanner while=
 the<br>
&gt; cluster is still starting up or still in safemode?<br>
&gt;<br>
&gt; I read the source code of DataBlockScanner.java, there is no parameter=
 to<br>
&gt; change the rate limit.<br>
&gt; It seams to be 1MB to 8MB always.<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Davey Yan<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r>Davey Yan
</div>

--001a11c3ca143d5e6c04e2eef3c0--