hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Open Scanner Latency
Date Mon, 31 Jan 2011 21:44:46 GMT

The region location cache is held by a soft reference, so as long as
you dont have memory pressure, it will never get invalidated just
because of time.

Another thing to consider, in HBase, the open scanner code also seeks
and reads the first block of the scan.  This may incur a read to disk
and might explain the hot vs cold you are seeing below.


On Mon, Jan 31, 2011 at 1:38 PM, Wayne <wav100@gmail.com> wrote:
> After doing many tests (10k serialized scans) we see that on average opening
> the scanner takes 2/3 of the read time if the read is fresh
> (scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second time around (1
> minute later) we assume the region cache is "hot" and the open scanner is
> much faster (scannerOpenWithStop=~8ms, scannerGetList=~10ms). After 1-2
> hours the cache is no longer "hot" and we are back to the initial numbers.
> We assume this is due to finding where the data is located in the cluster.
> We have cache turned off on our tables, but have 2% cache for hbase and the
> .META. table region server is showing 98% hit rate (.META. is served out of
> cache).  How can we pre-warm the cache to speed up our reads? It does not
> seem correct that 2/3 of our read time is always finding where the data is
> located. We have played with the prefetch.limit with various different
> settings without much difference. How can we warm up the cache? Per the
> #2468 wording we need "Clients could prewarm cache by doing a large scan of
> all the meta for the table instead of random reads for each miss". We
> definitely do not want to pay this price on each read, but would like to
> maybe set up a cron job to update once an hour for the tables this is needed
> for. It would be great to have a way to pin the region locations to memory
> or at least a method to heat it up before a big read process gets kicked
> off. A read's latency for our type of usage pattern should be based
> primarily on disk i/o latency and not looking around for where the data is
> located in the cluster. Adding SSD disks wouldn't help us much at all to
> lower read latency given what we are seeing.
> Any help or suggestions would be greatly appreciated.
> Thanks.

View raw message