Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of amit.mor.mail@gmail.com
 designates 209.85.223.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <391D65D0EBFC9B4B95E117F72A360F1A010452FF@SHSMSX101.ccr.corp.intel.com>
References: 
 <391D65D0EBFC9B4B95E117F72A360F1A01044C34@SHSMSX101.ccr.corp.intel.com>
	<CAAT7MkqkdHLcKG6xMZr5gOWmxToqbojafGxh21=pBw62EW=Rnw@mail.gmail.com>
	<CA+K-KJU36UTP9ty+RNoFsGRsWG4A4J11ZGLgb+DKwUh4gQL21Q@mail.gmail.com>
	<391D65D0EBFC9B4B95E117F72A360F1A010452FF@SHSMSX101.ccr.corp.intel.com>
Date: Tue, 4 Jun 2013 14:24:51 +0300
Message-ID: 
 <CA+K-KJXjA9coyYU+Jj9C0hZnD5ghU=PaEyuegCC9AE90tk8d_g@mail.gmail.com>
Subject: Re: what's the typical scan latency?
From: Amit Mor <amit.mor.mail@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a11c1fb426c230f04de525410

--001a11c1fb426c230f04de525410
Content-Type: text/plain; charset=ISO-8859-1

What's your blockCacheHitCachingRatio ? It would tell you about the ratio
of scans requested from cache (default) to the scans actually served from
the block cache. You can get that from the RS web ui. What you are seeing
can almost map to anything, for example: is scanner caching (client side)
enabled ? if so, how many rows are cached (how many rows returned by the
scanner.next RPC call) ? what's your HFile block size, block cache % of
total RS heap, max number of RPCs per RS for client connections,
tcpnodelay, your network topology and jitter, number of NICs. Are you using
HTableInterface connection pool ? HBase client is synchronous, so how do
achieve concurrency ?  What about your percentiles ? is 5ms the mean ?
median ? is 20ms only in the 99% percentile, etc. etc. etc ... I am far
from considering my self an expert on the general topic of HBase, so take
my tips with a pinch of salt - these are just factors I've considered when
trying to optimize my read latency. Hope that helps.


On Tue, Jun 4, 2013 at 4:02 AM, Liu, Raymond <raymond.liu@intel.com> wrote:

> Thanks Amit
>
> In my envionment, I run a dozens of client to read about 5-20K data per
> scan concurrently, And the average read latency for cached data is around
> 5-20ms.
> So it seems there must be something wrong with my cluster env or
> application. Or did you run that with multiple client?
>
>
> >Depends on so much environment related variables and on data as well.
> >But to give you a number after all:
> >One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network
> performance 'high' according to aws), with 90% of the time we do reads; our
> avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom
> filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note
> that it contains several other bits and pieces) and I would say our average
> latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File
> system access is much much painful, especially on ec2 m1.xlarge where you
> really can't tell what's going on, as far as I can tell. To tell you the
> truth as I see it, this is an abuse (for our use case) of the HBase store
> and for cache like behavior I would recommend going to something like Redis.
>
>
> On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > What is that you are observing now?
> >
> > Regards
> > Ram
> >
> >
> > On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <raymond.liu@intel.com>
> > wrote:
> >
> > > Hi
> > >
> > >         If all the data is already in RS blockcache.
> > >         Then what's the typical scan latency for scan a few rows
> > > from a say several GB table ( with dozens of regions ) on a small
> > > cluster with
> > say
> > > 4 RS ?
> > >
> > >         A few ms? Tens of ms? Or more?
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> >
>

--001a11c1fb426c230f04de525410--