hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Major compactions and OS cache
Date Wed, 16 Feb 2011 18:03:41 GMT
Hi Otis,

Excellent reflexion, unfortunately I don't think anyone benchmarked it
to give a definitive answer.

One thing I'm sure of is that worse than screwing up the OS cache, it
also screws up the block cache! But this is the price to pay to clear
up old versions and regroup all store files into 1. If you're not
deleting a whole lot, or updating the same fields a ton, then maybe
you should explore setting a larger window between each major
compaction (current being once every 24h). I know some people just
plain disable major compactions because they are never overwriting


On Wed, Feb 16, 2011 at 4:30 AM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Hi,
> Over on http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html I saw
> this bit:
> "The most important factor is that HBase is not restarted frequently and that it
> performs house keeping on a regular basis. These so called compactions rewrite
> files as new data is added over time. All files in HDFS once written are
> immutable (for all sorts of reasons). Because of  that, data is written into
> new files and as their number grows HBase compacts them into another set of
> new, consolidated files. And here is  the kicker: HDFS is smart enough to put
> the data where it is needed!"
> ... and I always wondered what this does to the OS cache.  In some applications
> (non-HBase stuff, say full-text search), the OS cache plays a crucial role in
> how the system performs.  If you have to hit the disk too much, you're in
> trouble, so one of the things you avoid is making big changes to index files on
> disk in order to avoid invalidating data that's been nicely cached by the OS.
> However, with HBase, and especially major compactions, what happens with the OS
> cache?  All gone, right?
> Do people find this problematic?
> Or does the OS cache simply not play such a significant role in systems running
> HBase simply because the data it holds and that needs to be accessed is much
> bigger than the OS cache could ever be, so even with the OS cache full and hot,
> other data would still have to be read from disk anyway?
> Thanks,
> Otis

View raw message