Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7768610D84 for ; Sat, 21 Dec 2013 22:51:40 +0000 (UTC) Received: (qmail 81206 invoked by uid 500); 21 Dec 2013 22:51:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 81098 invoked by uid 500); 21 Dec 2013 22:51:38 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 81090 invoked by uid 99); 21 Dec 2013 22:51:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Dec 2013 22:51:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stoffe@gmail.com designates 209.85.220.45 as permitted sender) Received: from [209.85.220.45] (HELO mail-pa0-f45.google.com) (209.85.220.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Dec 2013 22:51:31 +0000 Received: by mail-pa0-f45.google.com with SMTP id fb1so4048010pad.32 for ; Sat, 21 Dec 2013 14:51:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=04SKZ9ivf5L05uvmcQJoKzuit7xWZ0ceeUvHFWGme7M=; b=Dg4t4GTYEr/pss9NaqxlafbcwYu9tPFrPx/9IsfcO3J8P8g0j6wWOtTbk4MDlg4SVV Amm736bxV2lnLzJQLRxuj6+PQLhxW8kc6PbB60e5XBTI2dCOhno20g6Pyb3pPTxagJjl 0gxalI8ipCCWcft6zK/g+7XwfUzNWAzcLDht/0vWwQgAd9vuTb+8Jsl592nK8su11Wm8 luMOFESvJdWdHaIHgnJsf4FbvgpwykHmHvXKZBkZSP8Lz+VdZWcZ6EJUEXqQS5Iy+ooW IF9x1xGx5AfsgBkyxyIrEu+3vdyIcGnzLkZxIRnYWfZ5Nw+K0nGMrOfmJDZfuvuULPgj mAJQ== MIME-Version: 1.0 X-Received: by 10.68.57.98 with SMTP id h2mr17284729pbq.17.1387666270107; Sat, 21 Dec 2013 14:51:10 -0800 (PST) Received: by 10.66.254.7 with HTTP; Sat, 21 Dec 2013 14:51:10 -0800 (PST) In-Reply-To: <1387665317.24506.YahooMailNeo@web140602.mail.bf1.yahoo.com> References: <1387658697.5097.YahooMailNeo@web140605.mail.bf1.yahoo.com> <1387665317.24506.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Sat, 21 Dec 2013 23:51:10 +0100 Message-ID: Subject: Re: Performance tuning From: =?ISO-8859-1?Q?Kristoffer_Sj=F6gren?= To: user@hbase.apache.org, lars hofhansl Content-Type: multipart/alternative; boundary=bcaec544efe018be8004ee133b55 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec544efe018be8004ee133b55 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yeah, im doing a count(*) query on the 96 region table. Do you mean to check network traffic between RS? >From debugging phoenix code I can see that there are 96 scans sent and each response returned back to the client contain only the sum of rows, which are then aggregated and returned. So the traffic between client and each RS is very small. On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl wrote: > Thanks Kristoffer, > > yeah, that's the right metric. I would put my bet on the slower network. > But you're also doing a select count(*) query in Phoenix, right? So > nothing should really be sent across the network. > > When you do the queries, can you check whether there is any network > traffic? > > -- Lars > > > > ________________________________ > From: Kristoffer Sj=F6gren > To: user@hbase.apache.org; lars hofhansl > Sent: Saturday, December 21, 2013 1:28 PM > Subject: Re: Performance tuning > > > @pradeep scanner caching should not be an issue since data transferred to > the client is tiny. > > @lars Yes, the data might be small for this particular case :-) > > I have checked everything I can think of on RS (CPU, network, Hbase > console, uptime etc) and nothing stands out, except for the pings (networ= k > pings). > There are 5 regions on 7, 18, 19, and 23 the others have 4. > hdfsBlocksLocalityIndex=3D100 on all RS (was that the correct metric?) > > -Kristoffer > > > > > On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl wrote: > > > Hi Kristoffer, > > For this particular problem. Are many regions on the same RegionServers= ? > > Did you profile those RegionServers? Anything weird on that box? > > Pings slower might well be an issue. How's the data locality? (You can > > check on a RegionServer's overview page). > > If needed, you can issue a major compaction to reestablish local data o= n > > all RegionServers. > > > > > > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny > > dataset it doesn't matter anyway. > > > > 10m rows across 96 regions is just about 100k rows per region. You won'= t > > see many of the nice properties for HBase. > > Try with 100m (or better 1bn rows). Then we're talking. For anything > below > > this you wouldn't want to use HBase anyway. > > (100k rows I could scan on my phone with a Perl script in less than 1s) > > > > > > With "ping" you mean an actual network ping, or some operation on top o= f > > HBase? > > > > > > -- Lars > > > > > > > > ________________________________ > > From: Kristoffer Sj=F6gren > > To: user@hbase.apache.org > > Sent: Saturday, December 21, 2013 11:17 AM > > Subject: Performance tuning > > > > > > Hi > > > > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the > last > > couple of days and need some help. > > > > Background. > > > > - 23 machine cluster, 32 cores, 4GB heap per RS. > > - Table t_24 have 24 online regions (24 salt buckets). > > - Table t_96 have 96 online regions (96 salt buckets). > > - 10.5 million rows per table. > > - Count query - select (*) from ... > > - Group by query - select A, B, C sum(D) from ... where (A =3D 1 and T = >=3D 0 > > and T <=3D 2147482800) group by A, B, C; > > > > What I found ultimately is that region servers 19, 20, 21, 22 and 23 > > are consistently > > 2-3x slower than the others. This hurts overall latency pretty bad sinc= e > > queries are executed in parallel on the RS and then aggregated at the > > client (through Phoenix). In Hannibal regions spread out evenly over > region > > servers, according to salt buckets (phoenix feature, pre-create regions > and > > a rowkey prefix). > > > > As far as I can tell, there is no network or hardware configuration > > divergence between the machines. No CPU, network or other notable > > divergence > > in Ganglia. No RS metric differences in HBase master console. > > > > The only thing that may be of interest is that pings (within the cluste= r) > > to > > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if > > this is significant, > > but I get a bad feeling about it since it match exactly with the RS tha= t > > stood out in my performance tests. > > > > Any ideas of how I might find the source of this problem? > > > > Cheers, > > -Kristoffer > > > --bcaec544efe018be8004ee133b55--