hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: HBase read perfomnance and HBase client
Date Thu, 01 Aug 2013 02:27:19 GMT
Some final numbers :

Test config:

HBase 0.94.6
blockcache=true, block size = 64K, KV size = 62 bytes (raw).

5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
1 RS Server: the same config.

Local network with ping between hosts: 0.1 ms


1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
threads, IO pool size and other settings.
2. HBase server was able to sustain 170K per sec (with 64K block size). All
from block cache. KV size = 62 bytes (very small). This is for single Get
op, 60 threads per client, 5 clients (on different hosts)
3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
tested: 30, 100. The same performance absolutely as with batch size = 1.
Multi get has some internal issues on RegionServer side. May be excessive
locking or some thing else.





On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
<vladrodionov@gmail.com>wrote:

> 1. SCR are enabled
> 2. Single Configuration for all table did not work well, but I will try it
> again
> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>
>
> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <larsh@apache.org> wrote:
>
>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>> RTT is bad, right? Are you seeing ~40ms latencies?
>>
>> This thread has gotten confusing.
>>
>> I would try these:
>> * one Configuration for all tables. Or even use a single
>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>> ExecutorService) constructor
>> * disable Nagle's: set both ipc.server.tcpnodelay and
>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>> server)
>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>> * enable short circuit reads (details depend on exact version of Hadoop).
>> Google will help :)
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Vladimir Rodionov <vladrodionov@gmail.com>
>> To: dev@hbase.apache.org
>> Cc:
>> Sent: Tuesday, July 30, 2013 1:30 PM
>> Subject: Re: HBase read perfomnance and HBase client
>>
>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>> thread performance and high latency ( 0.8ms in local network)?
>>
>>
>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>> <vladrodionov@gmail.com>wrote:
>>
>> > One more observation: One Configuration instance per HTable gives 50%
>> > boost as compared to single Configuration object for all HTable's - from
>> > 20K to 30K
>> >
>> >
>> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com
>> > > wrote:
>> >
>> >> This thread dump has been taken when client was sending 60 requests in
>> >> parallel (at least, in theory). There are 50 server handler threads.
>> >>
>> >>
>> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> >> vladrodionov@gmail.com> wrote:
>> >>
>> >>> Sure, here it is:
>> >>>
>> >>> http://pastebin.com/8TjyrKRT
>> >>>
>> >>> epoll is not only to read/write HDFS but to connect/listen to clients
>> as
>> >>> well?
>> >>>
>> >>>
>> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org> wrote:
>> >>>
>> >>>> Can you show us what the thread dump looks like when the threads
are
>> >>>> BLOCKED? There aren't that many locks on the read path when reading
>> >>>> out of the block cache, and epoll would only happen if you need
to
>> hit
>> >>>> HDFS, which you're saying is not happening.
>> >>>>
>> >>>> J-D
>> >>>>
>> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> >>>> <vladrodionov@gmail.com> wrote:
>> >>>> > I am hitting data in a block cache, of course. The data set
is very
>> >>>> small
>> >>>> > to fit comfortably into block cache and all request are directed
to
>> >>>> the
>> >>>> > same Region to guarantee single RS testing.
>> >>>> >
>> >>>> > To Ted:
>> >>>> >
>> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
with
>> >>>> respect
>> >>>> > to read performance?
>> >>>> >
>> >>>> >
>> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> >>>> jdcryans@apache.org>wrote:
>> >>>> >
>> >>>> >> That's a tough one.
>> >>>> >>
>> >>>> >> One thing that comes to mind is socket reuse. It used to
come up
>> more
>> >>>> >> more often but this is an issue that people hit when doing
loads
>> of
>> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >>>> >> anything :)
>> >>>> >>
>> >>>> >> Also if you _just_ want to saturate something, be it CPU
or
>> network,
>> >>>> >> wouldn't it be better to hit data only in the block cache?
This
>> way
>> >>>> it
>> >>>> >> has the lowest overhead?
>> >>>> >>
>> >>>> >> Last thing I wanted to mention is that yes, the client
doesn't
>> scale
>> >>>> >> very well. I would suggest you give the asynchbase client
a run.
>> >>>> >>
>> >>>> >> J-D
>> >>>> >>
>> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >>>> >> <vrodionov@carrieriq.com> wrote:
>> >>>> >> > I have been doing quite extensive testing of different
read
>> >>>> scenarios:
>> >>>> >> >
>> >>>> >> > 1. blockcache disabled/enabled
>> >>>> >> > 2. data is local/remote (no good hdfs locality)
>> >>>> >> >
>> >>>> >> > and it turned out that that I can not saturate 1 RS
using one
>> >>>> >> (comparable in CPU power and RAM) client host:
>> >>>> >> >
>> >>>> >> >  I am running client app with 60 read threads active
(with
>> >>>> multi-get)
>> >>>> >> that is going to one particular RS and
>> >>>> >> > this RS's load is 100 -150% (out of 3200% available)
- it means
>> >>>> that
>> >>>> >> load is ~5%
>> >>>> >> >
>> >>>> >> > All threads in RS are either in BLOCKED (wait) or
in IN_NATIVE
>> >>>> states
>> >>>> >> (epoll)
>> >>>> >> >
>> >>>> >> > I attribute this  to the HBase client implementation
which seems
>> >>>> to be
>> >>>> >> not scalable (I am going dig into client later on today).
>> >>>> >> >
>> >>>> >> > Some numbers: The maximum what I could get from Single
get (60
>> >>>> threads):
>> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >>>> >> >
>> >>>> >> > What are my options? I want to measure the limits
and I do not
>> >>>> want to
>> >>>> >> run Cluster of clients against just ONE Region Server?
>> >>>> >> >
>> >>>> >> > RS config: 96GB RAM, 16(32) CPU
>> >>>> >> > Client     : 48GB RAM   8 (16) CPU
>> >>>> >> >
>> >>>> >> > Best regards,
>> >>>> >> > Vladimir Rodionov
>> >>>> >> > Principal Platform Engineer
>> >>>> >> > Carrier IQ, www.carrieriq.com
>> >>>> >> > e-mail: vrodionov@carrieriq.com
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Confidentiality Notice:  The information contained
in this
>> message,
>> >>>> >> including any attachments hereto, may be confidential and
is
>> >>>> intended to be
>> >>>> >> read only by the individual or entity to whom this message
is
>> >>>> addressed. If
>> >>>> >> the reader of this message is not the intended recipient
or an
>> agent
>> >>>> or
>> >>>> >> designee of the intended recipient, please note that any
review,
>> use,
>> >>>> >> disclosure or distribution of this message or its attachments,
in
>> >>>> any form,
>> >>>> >> is strictly prohibited.  If you have received this message
in
>> error,
>> >>>> please
>> >>>> >> immediately notify the sender and/or Notifications@carrieriq.comand
>> >>>> >> delete or destroy any copy of this message and its attachments.
>> >>>> >>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message