hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Namenode fails to replicate file
Date Fri, 08 Feb 2008 17:58:36 GMT

That makes it wait, but I don't think it increases the urgency on the part
of the namenode.

As an interesting experiment, I had a cluster with lots of pending
replication to do that was happening slowly.  Restarting the name node
caused the rate of replication to increase massively.  The difference was
highly visible on the ganglia graph because the amount of I/O wait time on
the cluster increased to >15% from near zero.

On 2/7/08 11:39 PM, "dhruba Borthakur" <dhruba@yahoo-inc.com> wrote:

> You have to use the -w parameter to the setrep command to make it wait
> till the replication is complete. The following command
> bin/hadoop dfs -setrep 10 -w filename
> will block till all blocks of the file achieves a replication factor of
> 10.
> Thanks,
> dhruba
> -----Original Message-----
> From: Tim Wintle [mailto:tim.wintle@teamrubber.com]
> Sent: Thursday, February 07, 2008 11:05 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Namenode fails to replicate file
> Doesn't the -setrep command force the replication to be increased
> immediately?
> ./hadoop dfs -setrep [replication] path
> (I may have misunderstood)
> On Thu, 2008-02-07 at 17:05 -0800, Ted Dunning wrote:
>> Chris Kline reported a problem in early January where a file which had
> too
>> few replicated blocks did not get replicated until a DFS restart.
>> I just saw a similar issue.  I had a file that had a block with 1
> replica (2
>> required) that did not get replicated.  I changed the number of
> required
>> replicates, but nothing caused any action.  Changing the number of
> required
>> replicas on other files got them to be replicated.
>> I eventually copied the file to temp, deleted the original and moved
> the
>> copy back to the original place.  I was also able to read the entire
> file
>> which shows that the problem was not due to slow reporting from a down
>> datanode.
>> This happened just after I had a node failure which was why I was
> messing
>> with replication at all.  Since I was in the process of increasing the
>> replication on nearly 10,000 large files, my log files are full of
> other
>> stuff, but I am pretty sure that there is a bug here.
>> This was on a relatively small cluster with 13 data nodes.
>> It also brings up a related issue that has come up before in that
> there are
>> times when you may want to increase the number of replicas of a file
> right
>> NOW.  I don't know of any way to force this replication.  Is there
> such a
>> way?

View raw message