hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-3382) Make HBase client work better under concurrent clients
Date Tue, 06 Jan 2015 16:54:34 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell resolved HBASE-3382.
    Resolution: Later
      Assignee:     (was: ryan rawson)

I'd argue the old YCSB client code was poor both in terms of YCSB's objective and server loading
management (stampeding via multiple threads flushing deep write buffers). We should redo the
analysis using LoadTestTool or the new YCSB client at https://github.com/apurtell/ycsb/tree/new_hbase_client.
Resolving as Later. Can reopen if someone wants to take it up. I'm guessing probably that
won't happen.

Thanks for the nudge [~clehene]. 

> Make HBase client work better under concurrent clients
> ------------------------------------------------------
>                 Key: HBASE-3382
>                 URL: https://issues.apache.org/jira/browse/HBASE-3382
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: ryan rawson
>              Labels: delete
>         Attachments: HBASE-3382-nio.txt, HBASE-3382.txt
> The HBase client uses 1 socket per regionserver for communication.  This is good for
socket control but potentially bad for latency.  How bad?  I did a simple YCSB test that had
this config:
>  readproportion=0
>  updateproportion=0
>  scanproportion=1
>  insertproportion=0
>  fieldlength=10
>  fieldcount=100
>  requestdistribution=zipfian
>  scanlength=300
>  scanlengthdistribution=zipfian
> I ran this with 1 and 10 threads.  The summary is as so:
> 1 thread:
> [SCAN]	 Operations	1000
> [SCAN]	 AverageLatency(ms)	35.871
> 10 threads:
> [SCAN]	 Operations	1000
> [SCAN]	 AverageLatency(ms)	228.576
> We are taking a 6.5x latency hit in our client.  But why?
> First step was to move the deserialization out of the Connection thread, this seemed
like it could have a big win, an analog change on the server side got a 20% performance improvement
(already commited as HBASE-2941).  I did this and got about a 20% improvement again, with
that 228ms number going to about 190 ms.  
> So I then wrote a high performance nanosecond resolution tracing utility.  Clients can
flag an API call, and we get tracing and numbers through the client pipeline.  What I found
is that a lot of time is being spent in receiving the response from the network.  The code
block is like so:
>         NanoProfiler.split(id, "receiveResponse");
>         if (LOG.isDebugEnabled())
>           LOG.debug(getName() + " got value #" + id);
>         Call call = calls.get(id);
>         size -= 4;  // 4 byte off for id because we already read it.
>         ByteBuffer buf = ByteBuffer.allocate(size);
>         IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size);
>         buf.limit(size);
>         buf.rewind();
>         NanoProfiler.split(id, "setResponse", "Data size: " + size);
> I came up with some numbers:
> 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937
> 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273
> 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4
> 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569
> The first number is the internal counter for keeping requests unique from HTable on down.
 The numbers are in ns, the data size is in bytes.
> Doing some simple calculations, we see for the first line we were reading at about 31
MB/sec.  The second one is even worse.  Other calls are like:
> 26 (receiveResponse) split: 7985400 overall: 21546226 Data size: 850429
> which is 107 MB/sec which is pretty close to the maximum of gige.  In my set up, the
ycsb client ran on the master node and HAD to use network to talk to regionservers.
> Even at full line rate, we could still see unacceptable hold ups of unrelated calls that
just happen to need to talk to the same regionserver.
> This issue is about these findings, what to do, how to improve. 

This message was sent by Atlassian JIRA

View raw message