hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Loddengaard <a...@cloudera.com>
Subject Re: More Replication on dfs
Date Fri, 10 Apr 2009 17:31:43 GMT
Mithila,

Most people run with a replication of 3.  3 replicas gets you one local
copy, one copy on a different rack, and an additional copy on that same
rack.

Quantcast gave a talk at a user group a while ago about physically moving a
data center from one colo to another.  Turning off machines and moving them
increases the probability of those machines going down, so Quantcast upped
their replication to 7, I believe.  Once the move was done, they lowered
their replication back to whatever it was set to previously, which I think
was 3.

So anyway, a replication factor of 3 is totally sufficient, unless you've
come across a particular case when node failure is higher than "normal," for
example maybe if you're running super unreliable hardware.

Alex

On Thu, Apr 9, 2009 at 10:26 PM, Mithila Nagendra <mnagendr@asu.edu> wrote:

> To add to the question, how does one decide what is the optimal replication
> factor for a cluster. For instance what would be the appropriate
> replication
> factor for a cluster consisting of 5 nodes.
> Mithila
>
> On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <alex@cloudera.com>
> wrote:
>
> > Did you load any files when replication was set to 3?  If so, you'll have
> > to
> > rebalance:
> >
> > <
> http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balancer>
> > <
> >
> http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalancer
> > >
> >
> > Note that most people run HDFS with a replication factor of 3.  There
> have
> > been cases when clusters running with a replication of 2 discovered new
> > bugs, because replication is so often set to 3.  That said, if you can do
> > it, it's probably advisable to run with a replication factor of 3 instead
> > of
> > 2.
> >
> > Alex
> >
> > On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem <Aseem.Puri@honeywell.com
> > >wrote:
> >
> > > Hi
> > >
> > >            I am a new Hadoop user. I have a small cluster with 3
> > > Datanodes. In hadoop-site.xml values of dfs.replication property is 2
> > > but then also it is replicating data on 3 machines.
> > >
> > >
> > >
> > > Please tell why is it happening?
> > >
> > >
> > >
> > > Regards,
> > >
> > > Aseem Puri
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message