hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: More Replication on dfs
Date Fri, 10 Apr 2009 18:16:53 GMT
Changing the default replication in hadoop-site.xml does not affect files
already loaded into HDFS. File replication factor is controlled on a
per-file basis.

You need to use the command `hadoop fs -setrep n path...` to set the
replication factor to "n" for a particular path already present in HDFS. It
can also take a -R for recursive.

- Aaron

On Fri, Apr 10, 2009 at 10:34 AM, Alex Loddengaard <alex@cloudera.com>wrote:

> Aseem,
>
> How are you verifying that blocks are not being replicated?  Have you ran
> fsck?  *bin/hadoop fsck /*
>
> I'd be surprised if replication really wasn't happening.  Can you run fsck
> and pay attention to "Under-replicated blocks" and "Mis-replicated blocks?"
> In fact, can you just copy-paste the output of fsck?
>
> Alex
>
> On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem <Aseem.Puri@honeywell.com
> >wrote:
>
> >
> > Hi
> >        I also tried the command $ bin/hadoop balancer. But still the
> > same problem.
> >
> > Aseem
> >
> > -----Original Message-----
> > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > Sent: Friday, April 10, 2009 11:18 AM
> > To: core-user@hadoop.apache.org
> > Subject: RE: More Replication on dfs
> >
> > Hi Alex,
> >
> >        Thanks for sharing your knowledge. Till now I have three
> > machines and I have to check the behavior of Hadoop so I want
> > replication factor should be 2. I started my Hadoop server with
> > replication factor 3. After that I upload 3 files to implement word
> > count program. But as my all files are stored on one machine and
> > replicated to other datanodes also, so my map reduce program takes input
> > from one Datanode only. I want my files to be on different data node so
> > to check functionality of map reduce properly.
> >
> >        Also before starting my Hadoop server again with replication
> > factor 2 I formatted all Datanodes and deleted all old data manually.
> >
> > Please suggest what I should do now.
> >
> > Regards,
> > Aseem Puri
> >
> >
> > -----Original Message-----
> > From: Mithila Nagendra [mailto:mnagendr@asu.edu]
> > Sent: Friday, April 10, 2009 10:56 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: More Replication on dfs
> >
> > To add to the question, how does one decide what is the optimal
> > replication
> > factor for a cluster. For instance what would be the appropriate
> > replication
> > factor for a cluster consisting of 5 nodes.
> > Mithila
> >
> > On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <alex@cloudera.com>
> > wrote:
> >
> > > Did you load any files when replication was set to 3?  If so, you'll
> > have
> > > to
> > > rebalance:
> > >
> > >
> > <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
> > r>
> > > <
> > >
> > http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
> > er
> > > >
> > >
> > > Note that most people run HDFS with a replication factor of 3.  There
> > have
> > > been cases when clusters running with a replication of 2 discovered
> > new
> > > bugs, because replication is so often set to 3.  That said, if you can
> > do
> > > it, it's probably advisable to run with a replication factor of 3
> > instead
> > > of
> > > 2.
> > >
> > > Alex
> > >
> > > On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem <Aseem.Puri@honeywell.com
> > > >wrote:
> > >
> > > > Hi
> > > >
> > > >            I am a new Hadoop user. I have a small cluster with 3
> > > > Datanodes. In hadoop-site.xml values of dfs.replication property is
> > 2
> > > > but then also it is replicating data on 3 machines.
> > > >
> > > >
> > > >
> > > > Please tell why is it happening?
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Aseem Puri
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message