Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: help on key design
From: Michael Segel <msegel@segel.com>
In-Reply-To: 
 <CALte62x6yt24WTN1xiQoeCfkfeNMWVQzRT7ZY4Wq9-yfUsh-8Q@mail.gmail.com>
Date: Wed, 31 Jul 2013 13:41:30 -0500
Cc: Dhaval Shah <prince_mithibai@yahoo.co.in>
Content-Transfer-Encoding: quoted-printable
Message-Id: <1D4FF1CE-1CEC-4242-A92F-2AC7932E35A7@segel.com>
References: 
 <CAHqkQ+ay0+q8xHG60+hFkEHwkgo1W1HiMR5WT7MpAWCFmUW1sg@mail.gmail.com>
 <1375224045.67917.YahooMailNeo@web190103.mail.sg3.yahoo.com>
 <CALte62wSc71q8y-_O3MdC+fdmnBgVcFhgPPZnO1ARQG4NfrA5w@mail.gmail.com>
 <CADQCNfe79LhhEgSfZNmcwnuPtJLHRo7m_18-9aJ49cBnVzby7A@mail.gmail.com>
 <CAHqkQ+apv=e10NSU2Y_Z4kovGw0t9pnA2zM4Cxtdqi-y=ju=tQ@mail.gmail.com>
 <1375290892.48577.YahooMailNeo@web190104.mail.sg3.yahoo.com>
 <CALte62x6yt24WTN1xiQoeCfkfeNMWVQzRT7ZY4Wq9-yfUsh-8Q@mail.gmail.com>
To: user@hbase.apache.org

4 regions on 3 servers?=20
I'd say that they were already balanced.

The issue is that when they do their get(s) they are hitting one region. =
So more splits isn't the answer.=20


On Jul 31, 2013, at 12:49 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> =46rom the information Demian provided in the first email:
>=20
> bq. a table containing 20 million keys splitted automatically by HBase =
in 4
> regions and balanced in 3 region servers
>=20
> I think the number of regions should be increased through (manual)
> splitting so that the data is spread more evenly across servers.
>=20
> If the Get's are scattered across whole key space, there is some
> optimization the client can do. Namely group the Get's by region =
boundary
> and issue multi get per region.
>=20
> Please also refer to http://hbase.apache.org/book.html#rowkey.design,
> especially 6.3.2.
>=20
> Cheers
>=20
> On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah
> <prince_mithibai@yahoo.co.in>wrote:
>=20
>> Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems =
like
>> the 500 Gets are executed sequentially on the region server.
>>=20
>> Also 3k requests per minute =3D 50 requests per second. Assuming your
>> requests take 1 sec (which seems really long but who knows) then you =
need
>> atleast 50 threads/region server handlers to handle these. Defaults =
for
>> that number on some older versions of hbase is 10 which means you are
>> running out of threads. Which brings up the following questions -
>> What version of HBase are you running?
>> How many region server handlers do you have?
>>=20
>> Regards,
>> Dhaval
>>=20
>>=20
>> ----- Original Message -----
>> From: Demian Berjman <dberjman@despegar.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Wednesday, 31 July 2013 11:12 AM
>> Subject: Re: help on key design
>>=20
>> Thanks for the responses!
>>=20
>>> why don't you use a scan
>> I'll try that and compare it.
>>=20
>>> How much memory do you have for your region servers? Have you =
enabled
>>> block caching? Is your CPU spiking on your region servers?
>> Block caching is enabled. Cpu and memory dont seem to be a problem.
>>=20
>> We think we are saturating a region because the quantity of keys =
requested.
>> In that case my question will be if asking 500+ keys per request is a
>> normal scenario?
>>=20
>> Cheers,
>>=20
>>=20
>> On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina =
<pablomedina85@gmail.com
>>> wrote:
>>=20
>>> The scan can be an option if the cost of scanning undesired cells =
and
>>> discarding them trough filters is better than accessing those keys
>>> individually. I would say that as the number of 'undesired' cells
>> decreases
>>> the scan overall performance/efficiency gets increased. It all =
depends on
>>> how the keys are designed to be grouped together.
>>>=20
>>> 2013/7/30 Ted Yu <yuzhihong@gmail.com>
>>>=20
>>>> Please also go over http://hbase.apache.org/book.html#perf.reading
>>>>=20
>>>> Cheers
>>>>=20
>>>> On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <
>>> prince_mithibai@yahoo.co.in
>>>>> wrote:
>>>>=20
>>>>> If all your keys are grouped together, why don't you use a scan =
with
>>>>> start/end key specified? A sequential scan can theoretically be
>> faster
>>>> than
>>>>> MultiGet lookups (assuming your grouping is tight, you can also =
use
>>>> filters
>>>>> with the scan to give better performance)
>>>>>=20
>>>>> How much memory do you have for your region servers? Have you =
enabled
>>>>> block caching? Is your CPU spiking on your region servers?
>>>>>=20
>>>>> If you are saturating the resources on your *hot* region server =
then
>>> yes
>>>>> having more region servers will help. If no, then something else =
is
>> the
>>>>> bottleneck and you probably need to dig further
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> Regards,
>>>>> Dhaval
>>>>>=20
>>>>>=20
>>>>> ________________________________
>>>>> From: Demian Berjman <dberjman@despegar.com>
>>>>> To: user@hbase.apache.org
>>>>> Sent: Tuesday, 30 July 2013 4:37 PM
>>>>> Subject: help on key design
>>>>>=20
>>>>>=20
>>>>> Hi,
>>>>>=20
>>>>> I would like to explain our use case of HBase, the row key design =
and
>>> the
>>>>> problems we are having so anyone can give us a help:
>>>>>=20
>>>>> The first thing we noticed is that our data set is too small =
compared
>>> to
>>>>> other cases we read in the list and forums. We have a table
>> containing
>>> 20
>>>>> million keys splitted automatically by HBase in 4 regions and
>> balanced
>>>> in 3
>>>>> region servers. We have designed our key to keep together the set =
of
>>> keys
>>>>> requested by our app. That is, when we request a set of keys we
>> expect
>>>> them
>>>>> to be grouped together to improve data locality and block cache
>>>> efficiency.
>>>>>=20
>>>>> The second thing we noticed, compared to other cases, is that we
>>>> retrieve a
>>>>> bunch keys per request (500 aprox). Thus, during our peaks (3k
>> requests
>>>> per
>>>>> minute), we have a lot of requests going to a particular region
>> servers
>>>> and
>>>>> asking a lot of keys. That results in poor response times (in the
>> order
>>>> of
>>>>> seconds). Currently we are using multi gets.
>>>>>=20
>>>>> We think an improvement would be to spread the keys (introducing a
>>>>> randomized component on it) in more region servers, so each rs =
will
>>> have
>>>> to
>>>>> handle less keys and probably less requests. Doing that way the =
multi
>>>> gets
>>>>> will be spread over the region servers.
>>>>>=20
>>>>> Our questions:
>>>>>=20
>>>>> 1. Is it correct this design of asking so many keys on each =
request?
>>> (if
>>>>> you need high performance)
>>>>> 2. What about splitting in more region servers? It's a good idea? =
How
>>> we
>>>>> could accomplish this? We thought in apply some hashing...
>>>>>=20
>>>>> Thanks in advance!
>>>>>=20
>>>>=20
>>>=20
>>=20
>>=20