Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BC1610BEC for ; Wed, 31 Jul 2013 18:53:35 +0000 (UTC) Received: (qmail 99665 invoked by uid 500); 31 Jul 2013 18:53:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 99494 invoked by uid 500); 31 Jul 2013 18:53:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 59778 invoked by uid 99); 31 Jul 2013 18:41:58 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: help on key design From: Michael Segel In-Reply-To: Date: Wed, 31 Jul 2013 13:41:30 -0500 Cc: Dhaval Shah Content-Transfer-Encoding: quoted-printable Message-Id: <1D4FF1CE-1CEC-4242-A92F-2AC7932E35A7@segel.com> References: <1375224045.67917.YahooMailNeo@web190103.mail.sg3.yahoo.com> <1375290892.48577.YahooMailNeo@web190104.mail.sg3.yahoo.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org 4 regions on 3 servers?=20 I'd say that they were already balanced. The issue is that when they do their get(s) they are hitting one region. = So more splits isn't the answer.=20 On Jul 31, 2013, at 12:49 PM, Ted Yu wrote: > =46rom the information Demian provided in the first email: >=20 > bq. a table containing 20 million keys splitted automatically by HBase = in 4 > regions and balanced in 3 region servers >=20 > I think the number of regions should be increased through (manual) > splitting so that the data is spread more evenly across servers. >=20 > If the Get's are scattered across whole key space, there is some > optimization the client can do. Namely group the Get's by region = boundary > and issue multi get per region. >=20 > Please also refer to http://hbase.apache.org/book.html#rowkey.design, > especially 6.3.2. >=20 > Cheers >=20 > On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah > wrote: >=20 >> Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems = like >> the 500 Gets are executed sequentially on the region server. >>=20 >> Also 3k requests per minute =3D 50 requests per second. Assuming your >> requests take 1 sec (which seems really long but who knows) then you = need >> atleast 50 threads/region server handlers to handle these. Defaults = for >> that number on some older versions of hbase is 10 which means you are >> running out of threads. Which brings up the following questions - >> What version of HBase are you running? >> How many region server handlers do you have? >>=20 >> Regards, >> Dhaval >>=20 >>=20 >> ----- Original Message ----- >> From: Demian Berjman >> To: user@hbase.apache.org >> Cc: >> Sent: Wednesday, 31 July 2013 11:12 AM >> Subject: Re: help on key design >>=20 >> Thanks for the responses! >>=20 >>> why don't you use a scan >> I'll try that and compare it. >>=20 >>> How much memory do you have for your region servers? Have you = enabled >>> block caching? Is your CPU spiking on your region servers? >> Block caching is enabled. Cpu and memory dont seem to be a problem. >>=20 >> We think we are saturating a region because the quantity of keys = requested. >> In that case my question will be if asking 500+ keys per request is a >> normal scenario? >>=20 >> Cheers, >>=20 >>=20 >> On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina = >> wrote: >>=20 >>> The scan can be an option if the cost of scanning undesired cells = and >>> discarding them trough filters is better than accessing those keys >>> individually. I would say that as the number of 'undesired' cells >> decreases >>> the scan overall performance/efficiency gets increased. It all = depends on >>> how the keys are designed to be grouped together. >>>=20 >>> 2013/7/30 Ted Yu >>>=20 >>>> Please also go over http://hbase.apache.org/book.html#perf.reading >>>>=20 >>>> Cheers >>>>=20 >>>> On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah < >>> prince_mithibai@yahoo.co.in >>>>> wrote: >>>>=20 >>>>> If all your keys are grouped together, why don't you use a scan = with >>>>> start/end key specified? A sequential scan can theoretically be >> faster >>>> than >>>>> MultiGet lookups (assuming your grouping is tight, you can also = use >>>> filters >>>>> with the scan to give better performance) >>>>>=20 >>>>> How much memory do you have for your region servers? Have you = enabled >>>>> block caching? Is your CPU spiking on your region servers? >>>>>=20 >>>>> If you are saturating the resources on your *hot* region server = then >>> yes >>>>> having more region servers will help. If no, then something else = is >> the >>>>> bottleneck and you probably need to dig further >>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>> Regards, >>>>> Dhaval >>>>>=20 >>>>>=20 >>>>> ________________________________ >>>>> From: Demian Berjman >>>>> To: user@hbase.apache.org >>>>> Sent: Tuesday, 30 July 2013 4:37 PM >>>>> Subject: help on key design >>>>>=20 >>>>>=20 >>>>> Hi, >>>>>=20 >>>>> I would like to explain our use case of HBase, the row key design = and >>> the >>>>> problems we are having so anyone can give us a help: >>>>>=20 >>>>> The first thing we noticed is that our data set is too small = compared >>> to >>>>> other cases we read in the list and forums. We have a table >> containing >>> 20 >>>>> million keys splitted automatically by HBase in 4 regions and >> balanced >>>> in 3 >>>>> region servers. We have designed our key to keep together the set = of >>> keys >>>>> requested by our app. That is, when we request a set of keys we >> expect >>>> them >>>>> to be grouped together to improve data locality and block cache >>>> efficiency. >>>>>=20 >>>>> The second thing we noticed, compared to other cases, is that we >>>> retrieve a >>>>> bunch keys per request (500 aprox). Thus, during our peaks (3k >> requests >>>> per >>>>> minute), we have a lot of requests going to a particular region >> servers >>>> and >>>>> asking a lot of keys. That results in poor response times (in the >> order >>>> of >>>>> seconds). Currently we are using multi gets. >>>>>=20 >>>>> We think an improvement would be to spread the keys (introducing a >>>>> randomized component on it) in more region servers, so each rs = will >>> have >>>> to >>>>> handle less keys and probably less requests. Doing that way the = multi >>>> gets >>>>> will be spread over the region servers. >>>>>=20 >>>>> Our questions: >>>>>=20 >>>>> 1. Is it correct this design of asking so many keys on each = request? >>> (if >>>>> you need high performance) >>>>> 2. What about splitting in more region servers? It's a good idea? = How >>> we >>>>> could accomplish this? We thought in apply some hashing... >>>>>=20 >>>>> Thanks in advance! >>>>>=20 >>>>=20 >>>=20 >>=20 >>=20