hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Inconsistent scan performance
Date Fri, 25 Mar 2016 05:34:09 GMT
On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
james.johansville@gmail.com> wrote:

> Hello all,
> So, I wrote a Java application for HBase that does a partitioned full-table
> scan according to a set number of partitions. For example, if there are 20
> partitions specified, then 20 separate full scans are launched that cover
> an equal slice of the row identifier range.
> The rows are uniformly distributed throughout the RegionServers.

How many RegionServers? How many Regions? Are Regions evenly distributed
across the servers? If you put all partitions on one machine and then run
your client, do the timings even out?

The disparity seems really wide.


> I
> confirmed this through the hbase shell. I have only one column family, and
> each row has the same number of column qualifiers.
> My problem is that the individual scan performance is wildly inconsistent
> even though they fetch approximately a similar number of rows. This
> inconsistency appears to be random with respect to hosts or regionservers
> or partitions or CPU cores. I am the only user of the fleet and not running
> any other concurrent HBase operation.
> I started measuring from the beginning of the scan and stopped measuring
> after the scan was completed. I am not doing any logic with the results,
> just scanning them.
> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> insight? By comparison, a similar utility I wrote to upsert to
> regionservers was very consistent in ops/sec and I had no issues with it.
> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> output I saved that used 130 partitions.
> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> ops/sec:36358.38150289017
> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> ops/sec:31176.91380349608
> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> ops/sec:30772.08014764039
> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> ops/sec:7051.235410034865
> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> ops/sec:6046.3170939508955
> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> ops/sec:4803.316900101075
> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> ops/sec:4899.286583474505
> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> ops/sec:4886.875901705258
> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> ops/sec:4743.210480206996
> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> insight into how I can make the read performance more consistent?
> Thanks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message