Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 727DDF74A for ; Thu, 28 Mar 2013 21:24:16 +0000 (UTC) Received: (qmail 30642 invoked by uid 500); 28 Mar 2013 21:24:11 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 30438 invoked by uid 500); 28 Mar 2013 21:24:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 30431 invoked by uid 99); 28 Mar 2013 21:24:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 21:24:11 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of felix.giguere@mate1inc.com designates 74.125.245.76 as permitted sender) Received: from [74.125.245.76] (HELO na3sys010aog104.obsmtp.com) (74.125.245.76) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 28 Mar 2013 21:24:04 +0000 Received: from mail-wg0-f69.google.com ([74.125.82.69]) (using TLSv1) by na3sys010aob104.postini.com ([74.125.244.12]) with SMTP ID DSNKUVS03tGS6stf2Kt/P4PCQF+xPuYQT31s@postini.com; Thu, 28 Mar 2013 14:23:44 PDT Received: by mail-wg0-f69.google.com with SMTP id k14so3452582wgh.8 for ; Thu, 28 Mar 2013 14:23:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:x-received:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:x-gm-message-state; bh=6fJszf7S24wp+Ph3uatJj5S+uL+ad0iyKiTxAo5RCI0=; b=Vup8O7w2lNgv4u65WMLwhjkbOvcEYjOkx5ZaDklXQm7l8i+ms1TRmiCIPn2ShevaSN pyTo92qxN0afgINLa42atpK8spbkEfnADUWQ6IYzx5SWLJA03RNtoBXUgQzuVI0D8Q8z D0PM7qrdCgLLe2mQge/kHEMMhQ4pynWzX/Iq7flo41u6q94f+spiAItHWanJyIhP7XOh Am58f/pMlZnKYqh8Etbaygv/TkY613qekJlTCV+NVEIrbeKsx5btZ7Re5QHp+HHYjABK tuR8JN9BVAQ1xirAJ4Jn5V04VAOW8sZ4uCMrgcMq5pQkQDrEKsaoLzFQ2EWzDxgbDABY R1sw== X-Received: by 10.194.82.104 with SMTP id h8mr285908wjy.3.1364505821706; Thu, 28 Mar 2013 14:23:41 -0700 (PDT) X-Received: by 10.194.82.104 with SMTP id h8mr285902wjy.3.1364505821600; Thu, 28 Mar 2013 14:23:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.238.196 with HTTP; Thu, 28 Mar 2013 14:23:16 -0700 (PDT) In-Reply-To: <5154ABCE.3030107@buscapecompany.com> References: <5154ABCE.3030107@buscapecompany.com> From: Felix GV Date: Thu, 28 Mar 2013 17:23:16 -0400 Message-ID: Subject: Re: Why do some blocks refuse to replicate...? To: user Content-Type: multipart/alternative; boundary=047d7bb049d0ca715904d902c445 X-Gm-Message-State: ALoCoQlmeX3Y/UqOwFj12yS0h7nF2c+IB2QJrint0012pymEvOFPWW5MEVb3ZdxXe0CyqKnKTj+JpOEIa9Cjt89IHzj68wZtCvodYTkV0jBeOyFnRQr+9sJ62uTnQgCcV+pK4LhlH5ryWkwAdY0AtGyijtdhpN8BtFOarEofCxfyAw2LN6/xwzs= X-Virus-Checked: Checked by ClamAV on apache.org --047d7bb049d0ca715904d902c445 Content-Type: text/plain; charset=ISO-8859-1 Yes, I didn't specify how I was testing my changes, but basically, here's what I did: My hdfs-site.xml file was modified to include a reference the a file containing a list of all datanodes (via dfs.hosts) and a reference to a file containing decommissioned nodes (via dfs.hosts.exclude). After that, I just changed these files, not hdfs-site.xml. I first added all my old nodes in the dfs.hosts.exclude file, did hdfs dfsadmin -refreshNodes, and most of the data replicated correctly. I then tried removing all old nodes from the dfs.hosts file, did hdfs dfsadmin -refreshNodes, and I saw that I now had a coupe of corrupt and missing blocks (60 of them). I re-added all the old nodes in the dfs.hosts file, and removed them gradually, each time doing the refreshNodes or restarting the NN, and I narrowed it down to three datanodes in particular, which seem to be the three nodes where all of those 60 blocks are located. Is it possible, perhaps, that these three nodes are completely incapable of replicating what they have (because they're corrupt or something), and so every block was replicated from other nodes, but the blocks that happened to be located on these three nodes are... doomed? I can see the data in those blocks in the NN hdfs browser, so I guess it's not corrupted... I also tried pinging the new nodes from those old ones and it works too, so I guess there is no network partition... I'm in the process of increasing replication factor above 3, but I don't know if that's gonna do anything... -- Felix On Thu, Mar 28, 2013 at 4:45 PM, MARCOS MEDRADO RUBINELLI < marcosm@buscapecompany.com> wrote: > Felix, > > After changing hdfs-site.xml, did you run "hadoop dfsadmin -refreshNodes"? > That should have been enough, but you can try increasing the replication > factor of these files, wait for them to be replicated to the new nodes, > then setting it back to its original value. > > Cheers, > Marcos > > > In 28-03-2013 17:00, Felix GV wrote: > > Hello, > > I've been running a virtualized CDH 4.2 cluster. I now want to migrate > all my data to another (this time physical) set of slaves and then stop > using the virtualized slaves. > > I added the new physical slaves in the cluster, and marked all the old > virtualized slaves as decommissioned using the dfs.hosts.exclude setting in > hdfs-site.xml. > > Almost all of the data replicated successfully to the new slaves, but > when I bring down the old slaves, some blocks start showing up as missing > or corrupt (according to the NN UI as well as fsck*). If I restart the old > slaves, then there are no missing blocks reported by fsck. > > I've tried shutting down the old slaves two by two, and for some of them > I saw no problem, but then at some point I found two slaves which, when > shut down, resulted in a couple of blocks being under-replicated (1 out of > 3 replicas found). For example, fsck would report stuff like this: > > /user/hive/warehouse/ads_destinations_hosts/part-m-00012: Under > replicated > BP-1207449144-10.10.10.21-1356639087818:blk_6150201737015349469_121244. > Target Replicas is 3 but found 1 replica(s). > > The system then stayed in that state apparently forever. It never > actually fixed the fact some blocks were under-replicated. Does that mean > there's something wrong with some of the old datanodes...? Why do they keep > block for themselves (even thought they're decommissioned) instead of > replicating those blocks to the new (non-decommissioned) datanodes? > > How do I force replication of under-replicated blocks? > > *Actually, the NN UI and fsck report slightly different things. The NN > UI always seems to report 60 under-replicated blocks, whereas fsck only > reports those 60 under-replicated blocks when I shut down some of the old > datanodes... When the old nodes are up, fsck reports 0 under-replicated > blocks... This is very confusing! > > Any help would be appreciated! Please don't hesitate to ask if I should > provide some of my logs, settings, or the output of some commands...! > > Thanks :) ! > > -- > Felix > > > --047d7bb049d0ca715904d902c445 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes, I didn't specify how I was testing my changes, bu= t basically, here's what I did:

My hdfs-site.x= ml file was modified to include a reference the a file containing a list of= all datanodes (via=A0dfs.hosts) and a reference to a file containing decom= missioned nodes (via=A0dfs.hosts.exclude). After that, I just changed these= files, not hdfs-site.xml.

I first added all my old nodes in the=A0dfs= .hosts.exclude=A0file, did hdfs=A0dfsadmin -refreshNodes,=A0and mos= t of the data replicated correctly.

I then tried removing all old nodes from th= e dfs.hosts file, did hdfs=A0dfsadmin -refreshNodes, and I saw that I now = had a coupe of corrupt and missing blocks (60 of them).

I re-added all the old nodes in th= e=A0dfs.hosts file, and removed them gradually, each time doing the = refreshNodes or restarting the NN, and I narrowed it down to three datanode= s in particular, which seem to be the three nodes where all of those 60 blo= cks are located.

Is it possible, perhaps, that these three n= odes are completely incapable of replicating what they have (because they&#= 39;re corrupt or something), and so every block was replicated from other n= odes, but the blocks that happened to be located on these three nodes are..= . doomed? I can see the data in those blocks in the NN hdfs browser, so I g= uess it's not corrupted... I also tried pinging the new nodes from thos= e old ones and it works too, so I guess there is no network partition...

I'm in the process of increasing replic= ation factor above 3, but I don't know if that's gonna do anything.= ..

--
Felix=


On Thu, Mar 28, 2013 at 4:45 PM, MARCOS = MEDRADO RUBINELLI <marcosm@buscapecompany.com> wrot= e:
Felix,

After changing hdfs-site.xml, did you run "hadoop dfsadmin -refreshNod= es"? That should have been enough, but you can try increasing the repl= ication factor of these files, wait for them to be replicated to the new no= des, then setting it back to its original value.

Cheers,
Marcos


In 28-03-2013 17:00, Felix GV wrote:
Hello,

I've been running a virtualized CDH 4.2 cluster. I now want to mig= rate all my data to another (this time physical) set of slaves and then sto= p using the virtualized slaves.

I added the new physical slaves in the cluster, and marked all the old= virtualized slaves as decommissioned using the=A0dfs.hosts.exclude setting= in hdfs-site.xml.

Almost all of the data replicated successfully to the new slaves, but = when I bring down the old slaves, some blocks start showing up as missing o= r corrupt (according to the NN UI as well as fsck*). If I restart the old s= laves, then there are no missing blocks reported by fsck.

I've tried shutting down the old slaves two by two, and for some o= f them I saw no problem, but then at some point I found two slaves which, w= hen shut down, resulted in a couple of blocks being under-replicated (1 out= of 3 replicas found). For example, fsck would report stuff like this:

/user/hive/warehouse/ads_destinations_hosts/part-m-00012: =A0Under rep= licated BP-1207449144-10.10.10.21-1356639087818:blk_6150201737015349469_121= 244. Target Replicas is 3 but found 1 replica(s).

The system then stayed in that state apparently forever. It never actu= ally fixed the fact some blocks were under-replicated. Does that mean there= 's something wrong with some of the old datanodes...? Why do they keep = block for themselves (even thought they're decommissioned) instead of replicating those blocks to= the new (non-decommissioned) datanodes?

How do I force replication of under-replicated blocks?

*Actually, the NN UI and fsck report slightly different things. The NN= UI always seems to report 60 under-replicated blocks, whereas fsck only re= ports those 60 under-replicated blocks when I shut down some of the old dat= anodes... When the old nodes are up, fsck reports 0 under-replicated blocks... This is very confu= sing!

Any help would be appreciated! Please don't hesitate to ask if I s= hould provide some of my logs, settings, or the output of some commands...!=

Thanks :) !

--
Felix


--047d7bb049d0ca715904d902c445--