hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dagaev <michael.dag...@gmail.com>
Subject Re: Question on region server/data node restart
Date Tue, 24 Feb 2009 16:04:26 GMT
I do not if it was holding ROOT or META region.
It looks like requests may fail in Hbase 0.18 if a region server stops.

Thanks,
M.

On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Well this should not happen like that. Was the region server holding the
> ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
> branch-0.18. I suggest you upgrade to that version if you don't want to
> break your MR jobs.
>
> J-D
>
> On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> <michael.dagaev@gmail.com>wrote:
>
>> What I see now is that the client gets an exception (see below) once a
>> region servers stops:
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
>> address listed in .META.
>> ...
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying to contact region server <region server>:60020 for region
>>
>> I guess the exception occurred since the region server is down. Is it
>> correct?
>>
>> Thank you for your cooperation,
>> M.
>>
>> P. S. We are running version 0.18.1
>>
>> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <jdcryans@apache.org>
>> wrote:
>> > Correcting myself, no waiting time regards the time to figure the node is
>> > dead. It will still have to fetch the region location in META.
>> >
>> > J-D
>> >
>> >
>> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> Well if a region server dies instead of being cleanly shut down, it
>> takes
>> >> in the worst case 180 seconds (a region server lease length) before the
>> >> Master reassigns the regions. Clients trying to connect to that server
>> will
>> >> take IIRC 10 seconds to figure the node is down then the time to
>> communicate
>> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it will
>> retry
>> >> all of that.
>> >>
>> >> In the next release (0.20.0), the master is notified by Zookeeper in the
>> >> following seconds of a region server death and will proceed to reassign
>> the
>> >> regions immediately.
>> >>
>> >> If the client don't have the region in cache and META is updated with
>> the
>> >> region server death, there will be no waiting time.
>> >>
>> >> J-D
>> >>
>> >>
>> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
>> michael.dagaev@gmail.com>wrote:
>> >>
>> >>> Thanks, now it is clear.
>> >>>
>> >>> However, if a region server is down, it takes a lot of time to retry
>> >>> first,
>> >>> to rescan the META region when the retries fail, rescan ROOT, etc. to
>> >>> get eventually to another region server, which will handle the request.
>> >>> Is it correct ?
>> >>>
>> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >>> wrote:
>> >>> > This is why we have a META table, it holds the location info. See
>> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >>> >
>> >>> > J-D
>> >>> >
>> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> >>> michael.dagaev@gmail.com>wrote:
>> >>> >
>> >>> >> Thanks, Jean-Daniel.
>> >>> >>
>> >>> >> I did run hbase-daemon stop regionserver and start regionserver
>> >>> >> and saw the client retrying to connect to the restarted region
>> server.
>> >>> >>
>> >>> >> How does it know to connect to another region server ? Maybe
it
>> stops
>> >>> >> retrying, asks master, and get another region server to connect
to.
>> >>> >> Is it correct ?
>> >>> >>
>> >>> >> Thank you for your cooperation,
>> >>> >> M.
>> >>> >>
>> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org>
>> >>> >> wrote:
>> >>> >> > Michael,
>> >>> >> >
>> >>> >> > Regards stopping those nodes, do it using
>> hadoop-daemon/hbase-daemon
>> >>> to
>> >>> >> stop
>> >>> >> > them cleanly. Requests from the clients will not "fail",
they will
>> >>> simply
>> >>> >> be
>> >>> >> > told to look elsewhere for the regions they have in cache.
Unless
>> you
>> >>> >> only
>> >>> >> > have 1 region server...
>> >>> >> >
>> >>> >> > Regards starting the nodes, apart from the usual
>> >>> >> hadoop-daemon/hbase-daemon,
>> >>> >> > no.
>> >>> >> >
>> >>> >> > J-D
>> >>> >> >
>> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >>> >> michael.dagaev@gmail.com>wrote:
>> >>> >> >
>> >>> >> >> Hi, all
>> >>> >> >>
>> >>> >> >>     As I understand, I can stop a region server
and a data node
>> in a
>> >>> >> >> cluster
>> >>> >> >> "semi-transparently" for clients, i. e. the requests
handled  by
>> the
>> >>> >> >> region server
>> >>> >> >> at that time will fail, but cluster will be working.
>> >>> >> >>
>> >>> >> >> If I start the data node and region server  I do
not have to do
>> >>> anything
>> >>> >> to
>> >>> >> >> make
>> >>> >> >> them work.
>> >>> >> >>
>> >>> >> >> Is it correct ?
>> >>> >> >>
>> >>> >> >> Thank you for your cooperation,
>> >>> >> >> M.
>> >>> >> >>
>> >>> >> >
>> >>> >>
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Mime
View raw message