hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: HBase read perfomnance and HBase client
Date Tue, 30 Jul 2013 20:50:18 GMT
With Nagle's you'd see something around 40ms. You are not saying 0.8ms RTT is bad, right? Are
you seeing ~40ms latencies?

This thread has gotten confusing.

I would try these:
* one Configuration for all tables. Or even use a single HConnection/Threadpool and use the
HTable(byte[], HConnection, ExecutorService) constructor
* disable Nagle's: set both ipc.server.tcpnodelay and hbase.ipc.client.tcpnodelay to true
in hbase-site.xml (both client *and* server)
* increase hbase.client.ipc.pool.size in client's hbase-site.xml
* enable short circuit reads (details depend on exact version of Hadoop). Google will help
:)

-- Lars


----- Original Message -----
From: Vladimir Rodionov <vladrodionov@gmail.com>
To: dev@hbase.apache.org
Cc: 
Sent: Tuesday, July 30, 2013 1:30 PM
Subject: Re: HBase read perfomnance and HBase client

This hbase.ipc.client.tcpnodelay (default - false) explains poor single
thread performance and high latency ( 0.8ms in local network)?


On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
<vladrodionov@gmail.com>wrote:

> One more observation: One Configuration instance per HTable gives 50%
> boost as compared to single Configuration object for all HTable's - from
> 20K to 30K
>
>
> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> This thread dump has been taken when client was sending 60 requests in
>> parallel (at least, in theory). There are 50 server handler threads.
>>
>>
>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com> wrote:
>>
>>> Sure, here it is:
>>>
>>> http://pastebin.com/8TjyrKRT
>>>
>>> epoll is not only to read/write HDFS but to connect/listen to clients as
>>> well?
>>>
>>>
>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org> wrote:
>>>
>>>> Can you show us what the thread dump looks like when the threads are
>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>> out of the block cache, and epoll would only happen if you need to hit
>>>> HDFS, which you're saying is not happening.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>> <vladrodionov@gmail.com> wrote:
>>>> > I am hitting data in a block cache, of course. The data set is very
>>>> small
>>>> > to fit comfortably into block cache and all request are directed to
>>>> the
>>>> > same Region to guarantee single RS testing.
>>>> >
>>>> > To Ted:
>>>> >
>>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>>> respect
>>>> > to read performance?
>>>> >
>>>> >
>>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>> jdcryans@apache.org>wrote:
>>>> >
>>>> >> That's a tough one.
>>>> >>
>>>> >> One thing that comes to mind is socket reuse. It used to come up
more
>>>> >> more often but this is an issue that people hit when doing loads
of
>>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>>> >> anything :)
>>>> >>
>>>> >> Also if you _just_ want to saturate something, be it CPU or network,
>>>> >> wouldn't it be better to hit data only in the block cache? This
way
>>>> it
>>>> >> has the lowest overhead?
>>>> >>
>>>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>>>> >> very well. I would suggest you give the asynchbase client a run.
>>>> >>
>>>> >> J-D
>>>> >>
>>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>> >> <vrodionov@carrieriq.com> wrote:
>>>> >> > I have been doing quite extensive testing of different read
>>>> scenarios:
>>>> >> >
>>>> >> > 1. blockcache disabled/enabled
>>>> >> > 2. data is local/remote (no good hdfs locality)
>>>> >> >
>>>> >> > and it turned out that that I can not saturate 1 RS using one
>>>> >> (comparable in CPU power and RAM) client host:
>>>> >> >
>>>> >> >  I am running client app with 60 read threads active (with
>>>> multi-get)
>>>> >> that is going to one particular RS and
>>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>>>> that
>>>> >> load is ~5%
>>>> >> >
>>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>>> states
>>>> >> (epoll)
>>>> >> >
>>>> >> > I attribute this  to the HBase client implementation which
seems
>>>> to be
>>>> >> not scalable (I am going dig into client later on today).
>>>> >> >
>>>> >> > Some numbers: The maximum what I could get from Single get
(60
>>>> threads):
>>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>> >> >
>>>> >> > What are my options? I want to measure the limits and I do
not
>>>> want to
>>>> >> run Cluster of clients against just ONE Region Server?
>>>> >> >
>>>> >> > RS config: 96GB RAM, 16(32) CPU
>>>> >> > Client     : 48GB RAM   8 (16) CPU
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Vladimir Rodionov
>>>> >> > Principal Platform Engineer
>>>> >> > Carrier IQ, www.carrieriq.com
>>>> >> > e-mail: vrodionov@carrieriq.com
>>>> >> >
>>>> >> >
>>>> >> > Confidentiality Notice:  The information contained in this
message,
>>>> >> including any attachments hereto, may be confidential and is
>>>> intended to be
>>>> >> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>> >> the reader of this message is not the intended recipient or an agent
>>>> or
>>>> >> designee of the intended recipient, please note that any review,
use,
>>>> >> disclosure or distribution of this message or its attachments, in
>>>> any form,
>>>> >> is strictly prohibited.  If you have received this message in error,
>>>> please
>>>> >> immediately notify the sender and/or Notifications@carrieriq.com
and
>>>> >> delete or destroy any copy of this message and its attachments.
>>>> >>
>>>>
>>>
>>>
>>
>


Mime
View raw message