Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01E1B1029F for ; Fri, 2 Aug 2013 03:57:44 +0000 (UTC) Received: (qmail 69816 invoked by uid 500); 2 Aug 2013 03:57:29 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 69763 invoked by uid 500); 2 Aug 2013 03:57:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69740 invoked by uid 99); 2 Aug 2013 03:57:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Aug 2013 03:57:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davey.yan@gmail.com designates 209.85.215.51 as permitted sender) Received: from [209.85.215.51] (HELO mail-la0-f51.google.com) (209.85.215.51) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Aug 2013 03:57:18 +0000 Received: by mail-la0-f51.google.com with SMTP id fp13so105104lab.38 for ; Thu, 01 Aug 2013 20:56:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=agpn57VSNmU2CFUj6kOFr7yjXwMrhEU4p02ITmbYGrI=; b=ov6zT02WyQs3bKkMBlKFGIDCU7YUL3KjPvjoMMgP5tBWIaVvbfUNaT6F3j2I/o/Xlm dsWSIUMKEloYZdKzSW2kJcgpv0h2L1sHVY8Y8imdDyd87DXMeXOj6sOuvuRngA2lWo3S Yu7RXO0/Crg2WI1oo+PlmhaafrAfSOv01HUDTJgIhtetil9skN6BYMZBV6sASeUeb8bX EPlAWOeqAQpfcOvYbSa5HGy6zxwPWTiEfVb37xGYios5ya8Z0MBJS/s3CV6ZTKVMsnLK f5UuwzA2k4t/pXyCRe8tmPvixFtM5FZk0qjYrLbs0TTs9RMGIuWvhHJIEJGqKdBdqv1C /UFA== MIME-Version: 1.0 X-Received: by 10.112.219.102 with SMTP id pn6mr2695725lbc.18.1375415817820; Thu, 01 Aug 2013 20:56:57 -0700 (PDT) Received: by 10.114.70.180 with HTTP; Thu, 1 Aug 2013 20:56:57 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Aug 2013 11:56:57 +0800 Message-ID: Subject: Re: DataBlockScanner's rate limit From: Davey Yan To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c3ca143d5e6c04e2eef3c0 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3ca143d5e6c04e2eef3c0 Content-Type: text/plain; charset=ISO-8859-1 Hi Harsh, thanks for reply. Yes, dfs.replication was set to 1, but no missing mount. Another questions: will a single replication factor offen lead to block missing? After the startup, the ratio reported in admin ui, e.g. 0.9826, will not change? Even the DataBlockScanner is still running? On Fri, Aug 2, 2013 at 11:27 AM, Harsh J wrote: > Hi, > > The DataBlockScanner isn't responsible for the DN block reports at > startup, which is a wholly different thread/process - it is a NN > independent operation that merely verifies blocks in the background > for the DN's own health. Depending on what the outage caused, it is > likely that you are missing a mount and perhaps blocks of files with a > single replica. Run an fsck to identify what files these are and if > they used a single replication factor? > > On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan wrote: > > I recently got a mini cluster corrupted after my inappropriate process. > > > > This mini cluster's dfs.replication was set to 1. > > After irregular restart of OS, I cannot wait to leave safemode, the block > > ratio is 0.9862, < 0.999. > > In the http://ip:50075/blockScannerReport, I notice there is rate limit > to > > 1MB. > > It will verify the blocks for long time. > > > > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing. > > > > My question is: Why should we limit the rate in DataBlockScanner while > the > > cluster is still starting up or still in safemode? > > > > I read the source code of DataBlockScanner.java, there is no parameter to > > change the rate limit. > > It seams to be 1MB to 8MB always. > > > > > > -- > > Davey Yan > > > > -- > Harsh J > -- Davey Yan --001a11c3ca143d5e6c04e2eef3c0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Harsh, thanks for reply.

Yes,=A0dfs.replication was set to 1, but no missing mount.
=
Another questions: will a single replication factor offen lead to block= missing?
After the startup, th= e ratio reported in admin ui, e.g. 0.9826, will not change? =A0Even the=A0<= /span>DataBlock= Scanner= =A0is still running?




On Fri, Aug 2, 2013 at 11:27 = AM, Harsh J <harsh@cloudera.com> wrote:
Hi,

The DataBlockScanner isn't responsible for the DN block reports at
startup, which is a wholly different thread/process - it is a NN
independent operation that merely verifies blocks in the background
for the DN's own health. Depending on what the outage caused, it is
likely that you are missing a mount and perhaps blocks of files with a
single replica. Run an fsck to identify what files these are and if
they used a single replication factor?

On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <davey.yan@gmail.com> wrote:
> I recently got a mini cluster corrupted after my inappropriate process= .
>
> This mini cluster's dfs.replication was set to 1.
> After irregular restart of OS, I cannot wait to leave safemode, the bl= ock
> ratio is 0.9862, < 0.999.
> In the http://ip:50075/blockScannerReport, I notice there is rate limit to > 1MB.
> It will verify the blocks for long time.
>
> So I "hadoop dfdsadmin safemode leave", and then I got block= s missing.
>
> My question is: Why should we limit the rate in DataBlockScanner while= the
> cluster is still starting up or still in safemode?
>
> I read the source code of DataBlockScanner.java, there is no parameter= to
> change the rate limit.
> It seams to be 1MB to 8MB always.
>
>
> --
> Davey Yan



--
Harsh J



-- Davey Yan
--001a11c3ca143d5e6c04e2eef3c0--