hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Question on region server/data node restart
Date Tue, 24 Feb 2009 15:02:53 GMT
Well if a region server dies instead of being cleanly shut down, it takes in
the worst case 180 seconds (a region server lease length) before the Master
reassigns the regions. Clients trying to connect to that server will take
IIRC 10 seconds to figure the node is down then the time to communicate with
ROOT and META is under 1 sec. If META wasn't updated yet, it will retry all
of that.

In the next release (0.20.0), the master is notified by Zookeeper in the
following seconds of a region server death and will proceed to reassign the
regions immediately.

If the client don't have the region in cache and META is updated with the
region server death, there will be no waiting time.

J-D

On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <michael.dagaev@gmail.com>wrote:

> Thanks, now it is clear.
>
> However, if a region server is down, it takes a lot of time to retry first,
> to rescan the META region when the retries fail, rescan ROOT, etc. to
> get eventually to another region server, which will handle the request.
> Is it correct ?
>
> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <jdcryans@apache.org>
> wrote:
> > This is why we have a META table, it holds the location info. See
> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >
> >> Thanks, Jean-Daniel.
> >>
> >> I did run hbase-daemon stop regionserver and start regionserver
> >> and saw the client retrying to connect to the restarted region server.
> >>
> >> How does it know to connect to another region server ? Maybe it stops
> >> retrying, asks master, and get another region server to connect to.
> >> Is it correct ?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > Michael,
> >> >
> >> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon
> to
> >> stop
> >> > them cleanly. Requests from the clients will not "fail", they will
> simply
> >> be
> >> > told to look elsewhere for the regions they have in cache. Unless you
> >> only
> >> > have 1 region server...
> >> >
> >> > Regards starting the nodes, apart from the usual
> >> hadoop-daemon/hbase-daemon,
> >> > no.
> >> >
> >> > J-D
> >> >
> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >> michael.dagaev@gmail.com>wrote:
> >> >
> >> >> Hi, all
> >> >>
> >> >>     As I understand, I can stop a region server and a data node in
a
> >> >> cluster
> >> >> "semi-transparently" for clients, i. e. the requests handled  by the
> >> >> region server
> >> >> at that time will fail, but cluster will be working.
> >> >>
> >> >> If I start the data node and region server  I do not have to do
> anything
> >> to
> >> >> make
> >> >> them work.
> >> >>
> >> >> Is it correct ?
> >> >>
> >> >> Thank you for your cooperation,
> >> >> M.
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message