hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: When node is down
Date Mon, 09 Jul 2012 05:04:23 GMT
On Sun, Jun 24, 2012 at 8:14 PM, Michel Segel <michael_segel@hotmail.com>wrote:

> You don't notice it faster, it's the timeout.
> You can reduce the timeout, it's configurable. Default is 10 min.
>
> There shouldn't be downtime of the cluster, just the node.
>
> Note this is for Apache. MapR is different and someone from MapR should be
> able to provide details...
>

No downtime for MapR ... the failed drive is detected in 30 seconds or so
 (if the controller is jammed, Linux takes about 2 mins to "un-hang" the
entire system, so it could be as much as that).  The drive can be pulled
out and a new one inserted while the system is live.  Mapr will
automatically reformat and start using the newly added drive  in under 1
min.

While you are fetching the replacement drive,  the data that was on the bad
drive is immediately rebuilt and redistributed automatically.




>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 22, 2012, at 8:41 AM, Tom Brown <tombrown52@gmail.com> wrote:
>
> > Can it notice the node is down sooner? If that node is serving an active
> > region (or if it's a datanode for an active region), that would be a
> > potentially large amount of downtime.  With comodity hardware, and a
> large
> > enough cluster, there will always be a machine or two being rebuilt...
> >
> > Thanks!
> >
> > -Tom
> >
> > On Thursday, June 21, 2012, Michael Segel wrote:
> >
> >> Assuming that you have an Apache release (Apache, HW, Cloudera) ...
> >> (If MapR, replace the drive and you should be able to repair the cluster
> >> from the console. Node doesn't go down. )
> >> Node goes down.
> >> 10 min later, cluster sees node down. Should then be able to replicate
> the
> >> missing blocks.
> >>
> >> Replace disk w new disk and rebuild file system.
> >> Bring node up.
> >> Rebalance cluster.
> >>
> >> That should be pretty much it.
> >>
> >>
> >> On Jun 21, 2012, at 10:17 PM, David Charle wrote:
> >>
> >>> What is the best practice to remove a node and add the same node back
> for
> >>> hbase/hadoop ?
> >>>
> >>> Currently in our 10 node cluster; 2 nodes went down (bad disk, so node
> is
> >>> down as its the root volume+data); need to replace the disk and add
> them
> >>> back. Any quick suggestions or pointers to doc for the right procedure
> ?
> >>>
> >>> --
> >>> David
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message