hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Question on region server/data node restart
Date Tue, 24 Feb 2009 15:40:24 GMT
Well this should not happen like that. Was the region server holding the
ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
branch-0.18. I suggest you upgrade to that version if you don't want to
break your MR jobs.

J-D

On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
<michael.dagaev@gmail.com>wrote:

> What I see now is that the client gets an exception (see below) once a
> region servers stops:
>
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META.
> ...
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server <region server>:60020 for region
>
> I guess the exception occurred since the region server is down. Is it
> correct?
>
> Thank you for your cooperation,
> M.
>
> P. S. We are running version 0.18.1
>
> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <jdcryans@apache.org>
> wrote:
> > Correcting myself, no waiting time regards the time to figure the node is
> > dead. It will still have to fetch the region location in META.
> >
> > J-D
> >
> >
> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Well if a region server dies instead of being cleanly shut down, it
> takes
> >> in the worst case 180 seconds (a region server lease length) before the
> >> Master reassigns the regions. Clients trying to connect to that server
> will
> >> take IIRC 10 seconds to figure the node is down then the time to
> communicate
> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it will
> retry
> >> all of that.
> >>
> >> In the next release (0.20.0), the master is notified by Zookeeper in the
> >> following seconds of a region server death and will proceed to reassign
> the
> >> regions immediately.
> >>
> >> If the client don't have the region in cache and META is updated with
> the
> >> region server death, there will be no waiting time.
> >>
> >> J-D
> >>
> >>
> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >>
> >>> Thanks, now it is clear.
> >>>
> >>> However, if a region server is down, it takes a lot of time to retry
> >>> first,
> >>> to rescan the META region when the retries fail, rescan ROOT, etc. to
> >>> get eventually to another region server, which will handle the request.
> >>> Is it correct ?
> >>>
> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >>> wrote:
> >>> > This is why we have a META table, it holds the location info. See
> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >>> >
> >>> > J-D
> >>> >
> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> >>> michael.dagaev@gmail.com>wrote:
> >>> >
> >>> >> Thanks, Jean-Daniel.
> >>> >>
> >>> >> I did run hbase-daemon stop regionserver and start regionserver
> >>> >> and saw the client retrying to connect to the restarted region
> server.
> >>> >>
> >>> >> How does it know to connect to another region server ? Maybe it
> stops
> >>> >> retrying, asks master, and get another region server to connect
to.
> >>> >> Is it correct ?
> >>> >>
> >>> >> Thank you for your cooperation,
> >>> >> M.
> >>> >>
> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> >>> jdcryans@apache.org>
> >>> >> wrote:
> >>> >> > Michael,
> >>> >> >
> >>> >> > Regards stopping those nodes, do it using
> hadoop-daemon/hbase-daemon
> >>> to
> >>> >> stop
> >>> >> > them cleanly. Requests from the clients will not "fail", they
will
> >>> simply
> >>> >> be
> >>> >> > told to look elsewhere for the regions they have in cache.
Unless
> you
> >>> >> only
> >>> >> > have 1 region server...
> >>> >> >
> >>> >> > Regards starting the nodes, apart from the usual
> >>> >> hadoop-daemon/hbase-daemon,
> >>> >> > no.
> >>> >> >
> >>> >> > J-D
> >>> >> >
> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >>> >> michael.dagaev@gmail.com>wrote:
> >>> >> >
> >>> >> >> Hi, all
> >>> >> >>
> >>> >> >>     As I understand, I can stop a region server and a
data node
> in a
> >>> >> >> cluster
> >>> >> >> "semi-transparently" for clients, i. e. the requests handled
 by
> the
> >>> >> >> region server
> >>> >> >> at that time will fail, but cluster will be working.
> >>> >> >>
> >>> >> >> If I start the data node and region server  I do not have
to do
> >>> anything
> >>> >> to
> >>> >> >> make
> >>> >> >> them work.
> >>> >> >>
> >>> >> >> Is it correct ?
> >>> >> >>
> >>> >> >> Thank you for your cooperation,
> >>> >> >> M.
> >>> >> >>
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message