hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bobby Dennett" <softw...@bobby.fastmail.us>
Subject Is it safe to set default/minimum replication to 2?
Date Thu, 22 Jul 2010 01:29:36 GMT
The team that manages our Hadoop clusters is currently being pressured
to reduce block replication from 3 to 2 in our production cluster. This
request is for various reasons -- particularly the reduction of used
space in the cluster and potential of reduced write operations -- but
from what I've read previously, it seems to be strongly discouraged.

Of course I can't find it now, but I recall seeing a post that Doug
Cutting was involved with stating that having replication 3 is something
like 100 times "safer" than replication 2. If I remember correctly,
there was mention of potential NameNode bugs that could introduce
undetected corrupted/missing replicas so the idea was that if more
replicas are created, the chance of this type of bug is much less. On a
related note, it seems that the companies using a reduced replication
factor (e.g. Facebook) have also built an application layer on top of
Hadoop to perform exception handling, corruption issues, etc.
Unfortunately, we do not currently have the resources to do something
similar.

For anyone currently using a replication of 2 in production, can you
please share your experience and any issues you may have encountered?
Also, I would appreciate any thoughts about whether a replication factor
of 2 can be considered "safe".

Thanks in advance,
-Bobby

Mime
View raw message