hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Medina <pablomedin...@gmail.com>
Subject Re: help on key design
Date Wed, 31 Jul 2013 18:57:46 GMT
If you split that one hot region and then move a half to another region
server then you will move the half of the load of that hot region server.
The set of hot keys then will be spread over 2 region servers instead of
one.


2013/7/31 Michael Segel <msegel@segel.com>

> 4 regions on 3 servers?
> I'd say that they were already balanced.
>
> The issue is that when they do their get(s) they are hitting one region.
> So more splits isn't the answer.
>
>
> On Jul 31, 2013, at 12:49 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > From the information Demian provided in the first email:
> >
> > bq. a table containing 20 million keys splitted automatically by HBase
> in 4
> > regions and balanced in 3 region servers
> >
> > I think the number of regions should be increased through (manual)
> > splitting so that the data is spread more evenly across servers.
> >
> > If the Get's are scattered across whole key space, there is some
> > optimization the client can do. Namely group the Get's by region boundary
> > and issue multi get per region.
> >
> > Please also refer to http://hbase.apache.org/book.html#rowkey.design,
> > especially 6.3.2.
> >
> > Cheers
> >
> > On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah
> > <prince_mithibai@yahoo.co.in>wrote:
> >
> >> Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems
> like
> >> the 500 Gets are executed sequentially on the region server.
> >>
> >> Also 3k requests per minute = 50 requests per second. Assuming your
> >> requests take 1 sec (which seems really long but who knows) then you
> need
> >> atleast 50 threads/region server handlers to handle these. Defaults for
> >> that number on some older versions of hbase is 10 which means you are
> >> running out of threads. Which brings up the following questions -
> >> What version of HBase are you running?
> >> How many region server handlers do you have?
> >>
> >> Regards,
> >> Dhaval
> >>
> >>
> >> ----- Original Message -----
> >> From: Demian Berjman <dberjman@despegar.com>
> >> To: user@hbase.apache.org
> >> Cc:
> >> Sent: Wednesday, 31 July 2013 11:12 AM
> >> Subject: Re: help on key design
> >>
> >> Thanks for the responses!
> >>
> >>> why don't you use a scan
> >> I'll try that and compare it.
> >>
> >>> How much memory do you have for your region servers? Have you enabled
> >>> block caching? Is your CPU spiking on your region servers?
> >> Block caching is enabled. Cpu and memory dont seem to be a problem.
> >>
> >> We think we are saturating a region because the quantity of keys
> requested.
> >> In that case my question will be if asking 500+ keys per request is a
> >> normal scenario?
> >>
> >> Cheers,
> >>
> >>
> >> On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <pablomedina85@gmail.com
> >>> wrote:
> >>
> >>> The scan can be an option if the cost of scanning undesired cells and
> >>> discarding them trough filters is better than accessing those keys
> >>> individually. I would say that as the number of 'undesired' cells
> >> decreases
> >>> the scan overall performance/efficiency gets increased. It all depends
> on
> >>> how the keys are designed to be grouped together.
> >>>
> >>> 2013/7/30 Ted Yu <yuzhihong@gmail.com>
> >>>
> >>>> Please also go over http://hbase.apache.org/book.html#perf.reading
> >>>>
> >>>> Cheers
> >>>>
> >>>> On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <
> >>> prince_mithibai@yahoo.co.in
> >>>>> wrote:
> >>>>
> >>>>> If all your keys are grouped together, why don't you use a scan
with
> >>>>> start/end key specified? A sequential scan can theoretically be
> >> faster
> >>>> than
> >>>>> MultiGet lookups (assuming your grouping is tight, you can also
use
> >>>> filters
> >>>>> with the scan to give better performance)
> >>>>>
> >>>>> How much memory do you have for your region servers? Have you enabled
> >>>>> block caching? Is your CPU spiking on your region servers?
> >>>>>
> >>>>> If you are saturating the resources on your *hot* region server
then
> >>> yes
> >>>>> having more region servers will help. If no, then something else
is
> >> the
> >>>>> bottleneck and you probably need to dig further
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Dhaval
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Demian Berjman <dberjman@despegar.com>
> >>>>> To: user@hbase.apache.org
> >>>>> Sent: Tuesday, 30 July 2013 4:37 PM
> >>>>> Subject: help on key design
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I would like to explain our use case of HBase, the row key design
and
> >>> the
> >>>>> problems we are having so anyone can give us a help:
> >>>>>
> >>>>> The first thing we noticed is that our data set is too small compared
> >>> to
> >>>>> other cases we read in the list and forums. We have a table
> >> containing
> >>> 20
> >>>>> million keys splitted automatically by HBase in 4 regions and
> >> balanced
> >>>> in 3
> >>>>> region servers. We have designed our key to keep together the set
of
> >>> keys
> >>>>> requested by our app. That is, when we request a set of keys we
> >> expect
> >>>> them
> >>>>> to be grouped together to improve data locality and block cache
> >>>> efficiency.
> >>>>>
> >>>>> The second thing we noticed, compared to other cases, is that we
> >>>> retrieve a
> >>>>> bunch keys per request (500 aprox). Thus, during our peaks (3k
> >> requests
> >>>> per
> >>>>> minute), we have a lot of requests going to a particular region
> >> servers
> >>>> and
> >>>>> asking a lot of keys. That results in poor response times (in the
> >> order
> >>>> of
> >>>>> seconds). Currently we are using multi gets.
> >>>>>
> >>>>> We think an improvement would be to spread the keys (introducing
a
> >>>>> randomized component on it) in more region servers, so each rs will
> >>> have
> >>>> to
> >>>>> handle less keys and probably less requests. Doing that way the
multi
> >>>> gets
> >>>>> will be spread over the region servers.
> >>>>>
> >>>>> Our questions:
> >>>>>
> >>>>> 1. Is it correct this design of asking so many keys on each request?
> >>> (if
> >>>>> you need high performance)
> >>>>> 2. What about splitting in more region servers? It's a good idea?
How
> >>> we
> >>>>> could accomplish this? We thought in apply some hashing...
> >>>>>
> >>>>> Thanks in advance!
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message