hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [jira] Created: (HBASE-1312) ZooKeeper: Master's ephemeral node went away while it was still up and functioning normally
Date Mon, 06 Apr 2009 03:33:49 GMT

> From: Ryan Rawson
> Most people dont run hbase under some job control, so when
> hbase jvms die, they stay dead...

Well... We can leave it up to the user to do process recovery
should e.g. a HRS abort, or we can consider providing some
basic automatic recovery. For example, in my old deployment we
launched HBase (and Hadoop) daemons as children of monitoring
processes. Our monitoring and recovery framework was pretty
elaborate and I'm not suggesting to roll something like that.
However, to get people over the hump at this point in the
project's maturity where HRS may go down for "avoidable"
problems like OOME or filesystem glitches, running HBase
processes as children of simple monitors that can
automatically launch new children under some specific failure
scenarios is not necessarily a bad idea. For one thing it
specifically counteracts the "spiral of death" scenario where
OOME of one HRS takes it down, distributing increasing load
to others, which go down in turn, in an accelerating chain
reaction. 

   - Andy



      

Mime
View raw message