hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: HBase High Availability
Date Thu, 26 Nov 2009 11:40:34 GMT
On Thu, Nov 26, 2009 at 6:19 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> Probably around the same time as hadoop 0.21, in other words a few
> more months.  There may be chances to run RCs before then though.
>

Thanks for the quick reply Ryan. I am eagerly looking forward to
trying the RCs; as I am looking forward to a deployment around April
next year the timing looks just cool!

Thanks,

- Imran

> -ryan
>
> On Thu, Nov 26, 2009 at 3:15 AM, Imran M Yousuf <imyousuf@gmail.com> wrote:
>> On Thu, Nov 26, 2009 at 12:05 PM, Jean-Daniel Cryans
>> <jdcryans@apache.org> wrote:
>> <snip />
>>>
>>> Be also aware that we are planning to include a master-slave
>>> replication between datacenters in 0.21.
>>>
>>
>> From this discussion and a presentation of Ryan Rawson and Jonathan
>> Gray I am really looking forward to release 0.21, any idea on the
>> timeline?
>>
>> - Imran
>>
>>> J-D
>>>
>>> On Wed, Nov 25, 2009 at 8:45 PM, Murali Krishna. P
>>> <muralikpbhat@yahoo.com> wrote:
>>>> Thanks JD for the detailed reply.
>>>>
>>>> Does the underlying java api currently block in case if region is not available
? I would like to get an immediate retry indication for the java call in such cases so that
I can redirect the request to the duplicate table in the other data center. Can this be supported?
>>>>
>>>>  Thanks,
>>>> Murali Krishna
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Andrew Purtell <apurtell@apache.org>
>>>> To: hbase-user@hadoop.apache.org
>>>> Sent: Thu, 26 November, 2009 12:17:30 AM
>>>> Subject: Re: HBase High Availability
>>>>
>>>> First, there is work under way for 0.21 which will shorten the time necessary
for region redeployment. Part of the delay in 0.20 is less than ideal performance in that
regard by the master.
>>>>
>>>> Beyond that, just as a general operational principle, I recommend that you
host no more than 200-250 regions per region server. The Bigtable paper talks about each tablet
server hosting only 100 regions, with only 200 MB of data each. While that is not cost effective
for folks who do not build their own hardware in bulk, it should cause you to think about
why:
>>>>   - Limiting the number of regions per tablet server limits time to recovery
upon node failure -- you can engineer this to be within some threshold
>>>>   - Limiting the amount of data per region means that servers with reasonable
RAM can cache and serve a lot of the data out of memory for sub-disk data access latencies
>>>>
>>>> So the advice here is to opt for more servers, not less; more RAM, not less;
and smaller disk, not larger.
>>>>
>>>> You should also consider the impact of server failure on HDFS -- loss of
block replicas. For each under-replicated block, HDFS must work to make additional copies.
This can come at a bad time if loss of the blocks in the first place was due to overloading.
>>>> Smaller disks mean fewer lost block replicas. For example, attach 4 x 160
GB drives as JBOD (as opposed to 4 x 1 TB or similar). Losing one disk means a loss of 160
GB worth of block replicas only (as opposed to 1 TB). Loss of a whole server means losing
only 640 GB worth of block replicas (as opposed to 4 TB).
>>>> You can also consider attaching 6 or 8 or even more modest sized disks per
server to increase the I/O parallelism (number of spindles) while also constraining the amount
of block replica loss per disk failure.
>>>>
>>>> Even so, blocked reads and writes over some interval during region redeployment
due to server failure or load rebalancing is part of the Bigtable architecture and so HBase,
unless we take additional steps such as setting up active-passive region server pairs, but
that would have complications which affect consistency and performance and might not provide
enough benefit anyway (still there is time needed to detect failure and fall over). This is
not an unavailability of the Bigtable service. Other regions are not affected. This is graceful/proportional
service degradation in the face of partial failures. There are other alternatives to Bigtable
which degrade differently given partial failures. Such options can give you no waiting on
the write path at any time and possibly no waiting on the read path but you will lose strong
consistency as the trade off. So you may get stale answers over some (unbounded, iirc) period,
but this is the choice you make.
>>>>
>>>> HBase also has options like Stargate or the Thrift connector which can block
and retry on behalf of your clients so they are never blocked for writes. For read path options
I could look at having Stargate serve (possibly stale) answers out of a cache -- with some
flag that indicates noncanonical state -- if that would be useful, and/or return immediate
"try again" indication, so your clients are at least not stalled.
>>>>
>>>> Best regards,
>>>>
>>>>  - Andy
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Murali Krishna. P <muralikpbhat@yahoo.com>
>>>> To: hbase-user@hadoop.apache.org
>>>> Sent: Wed, November 25, 2009 1:31:45 AM
>>>> Subject: HBase High Availability
>>>>
>>>> Hi,
>>>>    This is regarding the region unavailability when a region server goes
down. There will be cases where we have thousands of regions per RS and it takes considerable
amount of time to redistribute the regions when a node fails. The service will be unavailable
during that period. I am evaluating HBase for an application where we need to guarantee close
to 100% availability (namenode is still SPOF, leave that).
>>>>
>>>>    One simple idea would be to replicate the regions in memory. Can we
load the same region in multiple region servers? I am not sure about the feasibility yet,
there will be issues like consistency across these in memory replicas. Wanted to know whether
there were any thoughts / work already going on this area? I saw some related discussion here
http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html, not sure what is the state.
>>>>
>>>>  Same needs to be done with the master as well or is it already done with
ZK? How fast is the master re-election and catalog load currently ? Do we always have multiple
masters in ready to run state?
>>>>
>>>>
>>>> Thanks,
>>>> Murali Krishna
>>>
>>
>>
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & Software Engineer
>> Smart IT Engineering
>> Dhaka, Bangladesh
>> Email: imran@smartitengineering.com
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Mime
View raw message