hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Smart Managed Major Compactions
Date Thu, 19 Jul 2012 00:52:19 GMT
On Wed, Jul 18, 2012 at 7:26 PM, Bryan Beaudreault
<bbeaudreault@gmail.com> wrote:
> I am looking into managing major compactions ourselves, but there doesn't appear to be
any mechanisms I can hook in to determine which tables need compacting.  Ideally each time
my cron job runs it would compact the table with the next longest time since compaction, but
I can't find a way to access this metric.

Would suggest you have a region-view rather than a table-view.

Internally, we look at the hdfs modification time when we check if we
are to compact.  If it is > whatever the major compaction interval set
for the particular column family is, we'll do a major compaction.

Running an external script, you could look at each region in turn on
occasion.  Look at its files.  Check their modification time (and you
perhaps how many files there are under the region column family) and
if its > whatever you want, run a major compaction on the region.

Try to balance how many you'd have running at a time.

> The default major compaction algorithm seems to be able to get the oldest modified time
for all store files for a region to determine when it was last major compacted.  I know this
is not ideal, but it seems good enough.  Unfortunately I don't see an easy way to get this.

Its in the stats datastructure for an hdfs file.  Scripting you could
parse it from an hdfs listing.


View raw message