hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demian Berjman <dberj...@despegar.com>
Subject Re: help on key design
Date Wed, 31 Jul 2013 15:12:12 GMT
Thanks for the responses!

>  why don't you use a scan
I'll try that and compare it.

> How much memory do you have for your region servers? Have you enabled
> block caching? Is your CPU spiking on your region servers?
Block caching is enabled. Cpu and memory dont seem to be a problem.

We think we are saturating a region because the quantity of keys requested.
In that case my question will be if asking 500+ keys per request is a
normal scenario?

Cheers,


On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <pablomedina85@gmail.com>wrote:

> The scan can be an option if the cost of scanning undesired cells and
> discarding them trough filters is better than accessing those keys
> individually. I would say that as the number of 'undesired' cells decreases
> the scan overall performance/efficiency gets increased. It all depends on
> how the keys are designed to be grouped together.
>
> 2013/7/30 Ted Yu <yuzhihong@gmail.com>
>
> > Please also go over http://hbase.apache.org/book.html#perf.reading
> >
> > Cheers
> >
> > On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <
> prince_mithibai@yahoo.co.in
> > >wrote:
> >
> > > If all your keys are grouped together, why don't you use a scan with
> > > start/end key specified? A sequential scan can theoretically be faster
> > than
> > > MultiGet lookups (assuming your grouping is tight, you can also use
> > filters
> > > with the scan to give better performance)
> > >
> > > How much memory do you have for your region servers? Have you enabled
> > > block caching? Is your CPU spiking on your region servers?
> > >
> > > If you are saturating the resources on your *hot* region server then
> yes
> > > having more region servers will help. If no, then something else is the
> > > bottleneck and you probably need to dig further
> > >
> > >
> > >
> > >
> > > Regards,
> > > Dhaval
> > >
> > >
> > > ________________________________
> > > From: Demian Berjman <dberjman@despegar.com>
> > > To: user@hbase.apache.org
> > > Sent: Tuesday, 30 July 2013 4:37 PM
> > > Subject: help on key design
> > >
> > >
> > > Hi,
> > >
> > > I would like to explain our use case of HBase, the row key design and
> the
> > > problems we are having so anyone can give us a help:
> > >
> > > The first thing we noticed is that our data set is too small compared
> to
> > > other cases we read in the list and forums. We have a table containing
> 20
> > > million keys splitted automatically by HBase in 4 regions and balanced
> > in 3
> > > region servers. We have designed our key to keep together the set of
> keys
> > > requested by our app. That is, when we request a set of keys we expect
> > them
> > > to be grouped together to improve data locality and block cache
> > efficiency.
> > >
> > > The second thing we noticed, compared to other cases, is that we
> > retrieve a
> > > bunch keys per request (500 aprox). Thus, during our peaks (3k requests
> > per
> > > minute), we have a lot of requests going to a particular region servers
> > and
> > > asking a lot of keys. That results in poor response times (in the order
> > of
> > > seconds). Currently we are using multi gets.
> > >
> > > We think an improvement would be to spread the keys (introducing a
> > > randomized component on it) in more region servers, so each rs will
> have
> > to
> > > handle less keys and probably less requests. Doing that way the multi
> > gets
> > > will be spread over the region servers.
> > >
> > > Our questions:
> > >
> > > 1. Is it correct this design of asking so many keys on each request?
> (if
> > > you need high performance)
> > > 2. What about splitting in more region servers? It's a good idea? How
> we
> > > could accomplish this? We thought in apply some hashing...
> > >
> > > Thanks in advance!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message