hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phantom <ghostwhoowa...@gmail.com>
Subject Re: HDFS replica management
Date Tue, 17 Jul 2007 18:37:33 GMT
I am sure re-replication is not done on every heartbeat miss since that
would be very expensive and inefficient. At the same time you cannot really
tell if a node is partitioned away, crashed or just slow. Is it threshold
based i.e I missed N heartbeats so re-replicate ? Which package in the
source code could I look at to glean this information ?

Thanks
A

On 7/17/07, Phantom <ghostwhoowalks@gmail.com> wrote:
>
> That's awesome.
>
> Thanks
> A
>
> On 7/17/07, Doug Cutting <cutting@apache.org> wrote:
> >
> > Phantom wrote:
> > > Here is the scenario I was concerned about. Consider three nodes in
> > the
> > > system A, B and C which are placed say in different racks. Let us say
> > that
> > > the disk on A fries up today. Now the blocks that were stored on A are
> > not
> > > going to re-replicated (this is my understanding but I could be wrong
> > in
> > > this assumption) to some other node or to the new disk with which you
> > would
> > > bring back A.
> >
> > That's incorrect.  When a datanode fails to send a heartbeat to the
> > namenode in a timely manner then its data is assumed missing and is
> > re-replicated.  And when block corruption is detected, corrupt replicas
> > are removed and non-corrupt replicas are re-replicated to maintain the
> > desired level of replication.
> >
> > Doug
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message