hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: blockcache 101
Date Tue, 15 Apr 2014 05:12:29 GMT
On Wed, Apr 9, 2014 at 10:24 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> the trend lines drawn on the graphs seem to be based on some assumption
> > that there is an exponential scaling pattern.
> Which charts are you specifically referring to? Indeed, the trend lines
> were generated rather casually with Excel and may be misleading. Perhaps a
> more responsible representation would be to simply connect each data point
> with a line to aid visibility.

Was referring to these graphs:

And yep, I think straight lines between the points (or just the points
themselves) might be more accurate.

> In practice I would think it would be sigmoid [...] As soon as it starts to
> > be larger than the cache capacity [...] as the dataset gets larger, the
> > latency will level out as a flat line, not continue to grow as your trend
> > lines are showing.
> When decoupling cache size from database size, you're presumably correct. I
> believe that's what's shown in the figures in perfeval_blockcache_v1.pdf,
> especially as total memory increases. The plateau effect is suggested in
> the 20G and 50G charts in that book. This is why I included the second set
> of charts in perfeval_blockcache_v2.pdf. The intention is to couple the
> cache size to dataset size and demonstrate how an implementation performs
> as the absolute values increase. That is, assuming hit,eviction rate remain
> roughly constant, how well does an implementation "scale up" to a larger
> memory footprint.

Hmm... in "v2.pdf" here you're looking at different ratios of DB size to
cache size, but there's also the secondary cache on the system (the OS
block cache), right? So when you say only 20GB "memory under management",
in fact you're still probably getting 100% hit rate on the case where the
DB is bigger than RAM, right?

I guess I just find the graphs a little hard to understand what they're
trying to demonstrate. Maybe would be better to have each graph show the
different cache implementations overlaid, rather than the different ratios
overlaid? That would better differentiate the scaling behavior of the
implementations vs each other. As you've got it, the results seem somewhat
obvious ("as the hit ratio gets worse, it gets slower").

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message