hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: HBase read perfomnance and HBase client
Date Thu, 01 Aug 2013 17:27:21 GMT
Network? 1GbE or 10GbE?

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <vladrodionov@gmail.com> wrote:

> Some final numbers :
> 
> Test config:
> 
> HBase 0.94.6
> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> 
> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> 1 RS Server: the same config.
> 
> Local network with ping between hosts: 0.1 ms
> 
> 
> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> threads, IO pool size and other settings.
> 2. HBase server was able to sustain 170K per sec (with 64K block size). All
> from block cache. KV size = 62 bytes (very small). This is for single Get
> op, 60 threads per client, 5 clients (on different hosts)
> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> tested: 30, 100. The same performance absolutely as with batch size = 1.
> Multi get has some internal issues on RegionServer side. May be excessive
> locking or some thing else.
> 
> 
> 
> 
> 
> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> <vladrodionov@gmail.com>wrote:
> 
>> 1. SCR are enabled
>> 2. Single Configuration for all table did not work well, but I will try it
>> again
>> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>> 
>> 
>> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <larsh@apache.org> wrote:
>> 
>>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>>> RTT is bad, right? Are you seeing ~40ms latencies?
>>> 
>>> This thread has gotten confusing.
>>> 
>>> I would try these:
>>> * one Configuration for all tables. Or even use a single
>>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>>> ExecutorService) constructor
>>> * disable Nagle's: set both ipc.server.tcpnodelay and
>>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>>> server)
>>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>>> * enable short circuit reads (details depend on exact version of Hadoop).
>>> Google will help :)
>>> 
>>> -- Lars
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Vladimir Rodionov <vladrodionov@gmail.com>
>>> To: dev@hbase.apache.org
>>> Cc:
>>> Sent: Tuesday, July 30, 2013 1:30 PM
>>> Subject: Re: HBase read perfomnance and HBase client
>>> 
>>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>>> thread performance and high latency ( 0.8ms in local network)?
>>> 
>>> 
>>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>>> <vladrodionov@gmail.com>wrote:
>>> 
>>>> One more observation: One Configuration instance per HTable gives 50%
>>>> boost as compared to single Configuration object for all HTable's - from
>>>> 20K to 30K
>>>> 
>>>> 
>>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>>> vladrodionov@gmail.com
>>>>> wrote:
>>>> 
>>>>> This thread dump has been taken when client was sending 60 requests in
>>>>> parallel (at least, in theory). There are 50 server handler threads.
>>>>> 
>>>>> 
>>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>>>>> vladrodionov@gmail.com> wrote:
>>>>> 
>>>>>> Sure, here it is:
>>>>>> 
>>>>>> http://pastebin.com/8TjyrKRT
>>>>>> 
>>>>>> epoll is not only to read/write HDFS but to connect/listen to clients
>>> as
>>>>>> well?
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>>>>> jdcryans@apache.org> wrote:
>>>>>> 
>>>>>>> Can you show us what the thread dump looks like when the threads
are
>>>>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>>>>> out of the block cache, and epoll would only happen if you need
to
>>> hit
>>>>>>> HDFS, which you're saying is not happening.
>>>>>>> 
>>>>>>> J-D
>>>>>>> 
>>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>>>>> <vladrodionov@gmail.com> wrote:
>>>>>>>> I am hitting data in a block cache, of course. The data set
is very
>>>>>>> small
>>>>>>>> to fit comfortably into block cache and all request are directed
to
>>>>>>> the
>>>>>>>> same Region to guarantee single RS testing.
>>>>>>>> 
>>>>>>>> To Ted:
>>>>>>>> 
>>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and
94.6 with
>>>>>>> respect
>>>>>>>> to read performance?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>>>>> jdcryans@apache.org>wrote:
>>>>>>>> 
>>>>>>>>> That's a tough one.
>>>>>>>>> 
>>>>>>>>> One thing that comes to mind is socket reuse. It used
to come up
>>> more
>>>>>>>>> more often but this is an issue that people hit when
doing loads
>>> of
>>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not
guaranteeing
>>>>>>>>> anything :)
>>>>>>>>> 
>>>>>>>>> Also if you _just_ want to saturate something, be it
CPU or
>>> network,
>>>>>>>>> wouldn't it be better to hit data only in the block cache?
This
>>> way
>>>>>>> it
>>>>>>>>> has the lowest overhead?
>>>>>>>>> 
>>>>>>>>> Last thing I wanted to mention is that yes, the client
doesn't
>>> scale
>>>>>>>>> very well. I would suggest you give the asynchbase client
a run.
>>>>>>>>> 
>>>>>>>>> J-D
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>>>>>>> <vrodionov@carrieriq.com> wrote:
>>>>>>>>>> I have been doing quite extensive testing of different
read
>>>>>>> scenarios:
>>>>>>>>>> 
>>>>>>>>>> 1. blockcache disabled/enabled
>>>>>>>>>> 2. data is local/remote (no good hdfs locality)
>>>>>>>>>> 
>>>>>>>>>> and it turned out that that I can not saturate 1
RS using one
>>>>>>>>> (comparable in CPU power and RAM) client host:
>>>>>>>>>> 
>>>>>>>>>> I am running client app with 60 read threads active
(with
>>>>>>> multi-get)
>>>>>>>>> that is going to one particular RS and
>>>>>>>>>> this RS's load is 100 -150% (out of 3200% available)
- it means
>>>>>>> that
>>>>>>>>> load is ~5%
>>>>>>>>>> 
>>>>>>>>>> All threads in RS are either in BLOCKED (wait) or
in IN_NATIVE
>>>>>>> states
>>>>>>>>> (epoll)
>>>>>>>>>> 
>>>>>>>>>> I attribute this  to the HBase client implementation
which seems
>>>>>>> to be
>>>>>>>>> not scalable (I am going dig into client later on today).
>>>>>>>>>> 
>>>>>>>>>> Some numbers: The maximum what I could get from Single
get (60
>>>>>>> threads):
>>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>>>>>>>> 
>>>>>>>>>> What are my options? I want to measure the limits
and I do not
>>>>>>> want to
>>>>>>>>> run Cluster of clients against just ONE Region Server?
>>>>>>>>>> 
>>>>>>>>>> RS config: 96GB RAM, 16(32) CPU
>>>>>>>>>> Client     : 48GB RAM   8 (16) CPU
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Rodionov
>>>>>>>>>> Principal Platform Engineer
>>>>>>>>>> Carrier IQ, www.carrieriq.com
>>>>>>>>>> e-mail: vrodionov@carrieriq.com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Confidentiality Notice:  The information contained
in this
>>> message,
>>>>>>>>> including any attachments hereto, may be confidential
and is
>>>>>>> intended to be
>>>>>>>>> read only by the individual or entity to whom this message
is
>>>>>>> addressed. If
>>>>>>>>> the reader of this message is not the intended recipient
or an
>>> agent
>>>>>>> or
>>>>>>>>> designee of the intended recipient, please note that
any review,
>>> use,
>>>>>>>>> disclosure or distribution of this message or its attachments,
in
>>>>>>> any form,
>>>>>>>>> is strictly prohibited.  If you have received this message
in
>>> error,
>>>>>>> please
>>>>>>>>> immediately notify the sender and/or Notifications@carrieriq.comand
>>>>>>>>> delete or destroy any copy of this message and its attachments.
>> 

Mime
View raw message