hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Performance tuning
Date Sat, 21 Dec 2013 23:00:31 GMT
Scans on RS 19 and 23, which have 5 regions instead of 4, stands out more
than scans on RS 20, 21, 22. But scans on RS 7 and 18, that also have 5
regions are doing fine, not best, but still in the mid-range.


On Sat, Dec 21, 2013 at 11:51 PM, Kristoffer Sjögren <stoffe@gmail.com>wrote:

> Yeah, im doing a count(*) query on the 96 region table. Do you mean to
> check network traffic between RS?
>
> From debugging phoenix code I can see that there are 96 scans sent and
> each response returned back to the client contain only the sum of rows,
> which are then aggregated and returned. So the traffic between client and
> each RS is very small.
>
>
>
>
> On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <larsh@apache.org> wrote:
>
>> Thanks Kristoffer,
>>
>> yeah, that's the right metric. I would put my bet on the slower network.
>> But you're also doing a select count(*) query in Phoenix, right? So
>> nothing should really be sent across the network.
>>
>> When you do the queries, can you check whether there is any network
>> traffic?
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Kristoffer Sjögren <stoffe@gmail.com>
>> To: user@hbase.apache.org; lars hofhansl <larsh@apache.org>
>> Sent: Saturday, December 21, 2013 1:28 PM
>> Subject: Re: Performance tuning
>>
>>
>> @pradeep scanner caching should not be an issue since data transferred to
>> the client is tiny.
>>
>> @lars Yes, the data might be small for this particular case :-)
>>
>> I have checked everything I can think of on RS (CPU, network, Hbase
>> console, uptime etc) and nothing stands out, except for the pings (network
>> pings).
>> There are 5 regions on 7, 18, 19, and 23 the others have 4.
>> hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?)
>>
>> -Kristoffer
>>
>>
>>
>>
>> On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <larsh@apache.org> wrote:
>>
>> > Hi Kristoffer,
>> > For this particular problem. Are many regions on the same RegionServers?
>> > Did you profile those RegionServers? Anything weird on that box?
>> > Pings slower might well be an issue. How's the data locality? (You can
>> > check on a RegionServer's overview page).
>> > If needed, you can issue a major compaction to reestablish local data on
>> > all RegionServers.
>> >
>> >
>> > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny
>> > dataset it doesn't matter anyway.
>> >
>> > 10m rows across 96 regions is just about 100k rows per region. You won't
>> > see many of the nice properties for HBase.
>> > Try with 100m (or better 1bn rows). Then we're talking. For anything
>> below
>> > this you wouldn't want to use HBase anyway.
>> > (100k rows I could scan on my phone with a Perl script in less than 1s)
>> >
>> >
>> > With "ping" you mean an actual network ping, or some operation on top of
>> > HBase?
>> >
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Kristoffer Sjögren <stoffe@gmail.com>
>> > To: user@hbase.apache.org
>> > Sent: Saturday, December 21, 2013 11:17 AM
>> > Subject: Performance tuning
>> >
>> >
>> > Hi
>> >
>> > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the
>> last
>> > couple of days and need some help.
>> >
>> > Background.
>> >
>> > - 23 machine cluster, 32 cores, 4GB heap per RS.
>> > - Table t_24 have 24 online regions (24 salt buckets).
>> > - Table t_96 have 96 online regions (96 salt buckets).
>> > - 10.5 million rows per table.
>> > - Count query - select (*) from ...
>> > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T >=
>> 0
>> > and T <= 2147482800) group by A, B, C;
>> >
>> > What I found ultimately is that region servers 19, 20, 21, 22 and 23
>> > are consistently
>> > 2-3x slower than the others. This hurts overall latency pretty bad since
>> > queries are executed in parallel on the RS and then aggregated at the
>> > client (through Phoenix). In Hannibal regions spread out evenly over
>> region
>> > servers, according to salt buckets (phoenix feature, pre-create regions
>> and
>> > a rowkey prefix).
>> >
>> > As far as I can tell, there is no network or hardware configuration
>> > divergence between the machines. No CPU, network or other notable
>> > divergence
>> > in Ganglia. No RS metric differences in HBase master console.
>> >
>> > The only thing that may be of interest is that pings (within the
>> cluster)
>> > to
>> > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if
>> > this is significant,
>> > but I get a bad feeling about it since it match exactly with the RS that
>> > stood out in my performance tests.
>> >
>> > Any ideas of how I might find the source of this problem?
>> >
>> > Cheers,
>> > -Kristoffer
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message