hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: endpoint coprocessor performance
Date Tue, 05 Mar 2013 01:58:35 GMT
Check your logs for whether your end-point coprocessor is hitting 
zookeeper on every invocation to figure out the region start key. 
Unfortunately (at least last time I checked), the default way of 
invoking an end point coprocessor doesn't use the meta cache. You can go 
through a combination of the following instead:
     HRegionLocation regionLocation = retried ?
         connection.relocateRegion(tableName, tableKey) :
         connection.locateRegion(tableName, tableKey);
Then call HConnection.processExecs call, passing in the regionKeys from 
You can trap the error case of the region being relocated and try again 
with retried = true and it'll update the meta data cache when 
relocateRegion is called.

Once we made this change for Phoenix, our latencies went way down.



On 03/04/2013 05:43 PM, Andrew Purtell wrote:
> Do you have timing results for an Apache HBase release? Our last release
> was 0.94.5.
> On Tuesday, March 5, 2013, Kim Hamilton wrote:
>> Hi all,
>> I've been lurking here for a while, so thanks for all the valuable tips and
>> guidance you've given so far.
>> I'm running some experiments to understand where to use coprocessors. One
>> interesting scenario is computing distinct values. I ran performance tests
>> with two distinct value implementations: one using endpoint coprocessors,
>> and one using just scans (computing distinct values client side only). I
>> noticed that the endpoint coprocessor implementation averaged 80 ms slower
>> than the scan implementation. Details of that are below for anyone
>> interested.
>> To drill into the performance, I instrumented the code and ultimately
>> deployed a no-op endpoint coprocessor, to look at the overhead of simply
>> calling it. I'm measuring around 100ms for calling my empty, no-op endpoint
>> coprocessor.
>> I need to do more tests, but I believe my tests are leading me to similar
>> conclusions drawn here:
>> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>> I.e. if the query/scan is selective enough (I'll go out on a limb and
>> estimate 50-100 rows), then it's better to just perform a scan and compute
>> client side. Endpoint coprocessors will make sense for larger result sets
>> and/or scans that hit multiple regions.
>> Before going too far, I wanted to check if anyone in this group has
>> suggestions. I.e. perhaps there are just some configuration options I've
>> not uncovered. Does this 100ms latency sound correct?
>> Thanks,
>> Kim
>> *Detailed results of distinct value comparison, just FYI*
>> Using 0.92.1-cdh4.1.0
>> Scan result size ~50-100
>> Row size 1kb, but after filtering for only desired columns, 380 bytes
>> *with coprocessors*
>> AverageLatency(ms), 176.1353
>> MinLatency(ms), 42
>> MaxLatency(ms), 2368
>> 95thPercentileLatency(ms), 321
>> 99thPercentileLatency(ms), 422
>> *scan-only, compute distinct values client side*
>> AverageLatency(ms), 92.8165
>> MinLatency(ms), 4
>> MaxLatency(ms), 986
>> 95thPercentileLatency(ms), 253
>> 99thPercentileLatency(ms), 356

View raw message