hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: Open Scanner Latency
Date Mon, 31 Jan 2011 22:17:28 GMT
I assume BLOCKCACHE => 'false' would turn this off? We have turned off cache
on all tables.

On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> The Regionserver caches blocks, so a second read would benefit from
> the caching of the first read.  Over time blocks get evicted in a LRU
> manner, and things would get slow again.
>
> Does this make sense to you?
>
> On Mon, Jan 31, 2011 at 1:50 PM, Wayne <wav100@gmail.com> wrote:
> > We have heavy writes always going on so there is always memory pressure.
> >
> > If the open scanner reads the first block maybe that explains the 8ms the
> > second time a test is run, but why is the first run averaging 35ms to
> open
> > and when the same read requests are sent again the open is only 8ms?
> There
> > is a difference between read #1 and read #2 that I can only explain by
> > region location search. Our writes our so heavy I assume this region
> > location information flushed always in 30-60 minutes.
> >
> >
> > On Mon, Jan 31, 2011 at 4:44 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >
> >> Hey,
> >>
> >> The region location cache is held by a soft reference, so as long as
> >> you dont have memory pressure, it will never get invalidated just
> >> because of time.
> >>
> >> Another thing to consider, in HBase, the open scanner code also seeks
> >> and reads the first block of the scan.  This may incur a read to disk
> >> and might explain the hot vs cold you are seeing below.
> >>
> >> -ryan
> >>
> >> On Mon, Jan 31, 2011 at 1:38 PM, Wayne <wav100@gmail.com> wrote:
> >> > After doing many tests (10k serialized scans) we see that on average
> >> opening
> >> > the scanner takes 2/3 of the read time if the read is fresh
> >> > (scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second time
> around
> >> (1
> >> > minute later) we assume the region cache is "hot" and the open scanner
> is
> >> > much faster (scannerOpenWithStop=~8ms, scannerGetList=~10ms). After
> 1-2
> >> > hours the cache is no longer "hot" and we are back to the initial
> >> numbers.
> >> > We assume this is due to finding where the data is located in the
> >> cluster.
> >> > We have cache turned off on our tables, but have 2% cache for hbase
> and
> >> the
> >> > .META. table region server is showing 98% hit rate (.META. is served
> out
> >> of
> >> > cache).  How can we pre-warm the cache to speed up our reads? It does
> not
> >> > seem correct that 2/3 of our read time is always finding where the
> data
> >> is
> >> > located. We have played with the prefetch.limit with various different
> >> > settings without much difference. How can we warm up the cache? Per
> the
> >> > #2468 wording we need "Clients could prewarm cache by doing a large
> scan
> >> of
> >> > all the meta for the table instead of random reads for each miss". We
> >> > definitely do not want to pay this price on each read, but would like
> to
> >> > maybe set up a cron job to update once an hour for the tables this is
> >> needed
> >> > for. It would be great to have a way to pin the region locations to
> >> memory
> >> > or at least a method to heat it up before a big read process gets
> kicked
> >> > off. A read's latency for our type of usage pattern should be based
> >> > primarily on disk i/o latency and not looking around for where the
> data
> >> is
> >> > located in the cluster. Adding SSD disks wouldn't help us much at all
> to
> >> > lower read latency given what we are seeing.
> >> >
> >> > Any help or suggestions would be greatly appreciated.
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message