hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MARCOS MEDRADO RUBINELLI <marc...@buscapecompany.com>
Subject Re: Why do some blocks refuse to replicate...?
Date Thu, 28 Mar 2013 20:45:02 GMT
Felix,

After changing hdfs-site.xml, did you run "hadoop dfsadmin -refreshNodes"? That should have
been enough, but you can try increasing the replication factor of these files, wait for them
to be replicated to the new nodes, then setting it back to its original value.

Cheers,
Marcos

In 28-03-2013 17:00, Felix GV wrote:
Hello,

I've been running a virtualized CDH 4.2 cluster. I now want to migrate all my data to another
(this time physical) set of slaves and then stop using the virtualized slaves.

I added the new physical slaves in the cluster, and marked all the old virtualized slaves
as decommissioned using the dfs.hosts.exclude setting in hdfs-site.xml.

Almost all of the data replicated successfully to the new slaves, but when I bring down the
old slaves, some blocks start showing up as missing or corrupt (according to the NN UI as
well as fsck*). If I restart the old slaves, then there are no missing blocks reported by
fsck.

I've tried shutting down the old slaves two by two, and for some of them I saw no problem,
but then at some point I found two slaves which, when shut down, resulted in a couple of blocks
being under-replicated (1 out of 3 replicas found). For example, fsck would report stuff like
this:

/user/hive/warehouse/ads_destinations_hosts/part-m-00012:  Under replicated BP-1207449144-10.10.10.21-1356639087818:blk_6150201737015349469_121244.
Target Replicas is 3 but found 1 replica(s).

The system then stayed in that state apparently forever. It never actually fixed the fact
some blocks were under-replicated. Does that mean there's something wrong with some of the
old datanodes...? Why do they keep block for themselves (even thought they're decommissioned)
instead of replicating those blocks to the new (non-decommissioned) datanodes?

How do I force replication of under-replicated blocks?

*Actually, the NN UI and fsck report slightly different things. The NN UI always seems to
report 60 under-replicated blocks, whereas fsck only reports those 60 under-replicated blocks
when I shut down some of the old datanodes... When the old nodes are up, fsck reports 0 under-replicated
blocks... This is very confusing!

Any help would be appreciated! Please don't hesitate to ask if I should provide some of my
logs, settings, or the output of some commands...!

Thanks :) !

--
Felix


Mime
View raw message