hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TuX RaceR <tuxrace...@gmail.com>
Subject Re: Additional disk space required for Hbase compactions..
Date Tue, 18 May 2010 08:06:45 GMT
Thank you Jonathan for raising the Jira and attaching a patch

I was looking for more info on how major compactions and minor 
compactions work and google found me this page:


After reading the wiki page and Google Bigtable paper, it seems to me 
that there is a difference between Google 'minor compactions' andHbase 
'minor compactions'.

In google, a minor compaction is (from the paper):
"5.4 Compactions
As write operations execute, the size of the memtable increases. When 
the memtable size reaches a threshold, the memtable is frozen, a new 
memtable is created, and the frozen memtable is converted to an SSTable 
and written to GFS. This minor compaction process has two goals:
it shrinks the memory usage of the tablet server, and it reduces the 
amount of data that has to be read from the commit log during recovery 
if this server dies. Incoming read and write operations can continue 
while compactions occur.
Every minor compaction creates a new SSTable. If this behavior continued 
unchecked, read operations might need to merge updates from an arbitrary 
number of SSTables."

On the other hand the Hbase wiki:
"Compactions: When the number of MapFiles exceeds a configurable 
threshold, a minor compaction is performed which consolidates the most 
recently written MapFiles."

So it seems that:
1) google minor compactions are equivalent to Hbase cache flushes
2) google major compactions are equivalent to Hbase major compactions
3) there is no equivalent of Hbase minor compactions in the google design.

can somebody confirm this?
As in my case my data is almost immutable (i.e I do not have a lot of 
space to claim for deleted rows as there are few of them) , I am 
wondering if the compactions do not more harm than good.


On 17/05/10 23:12, Jonathan Gray wrote:
> No there isn't.
> I just opened a JIRA to make it so it can be set to 0 to disable.
> https://issues.apache.org/jira/browse/HBASE-2559
> Will put up a patch for trunk/0.21.
> JG
>> -----Original Message-----
>> From: TuX RaceR [mailto:tuxracer69@gmail.com]
>> Sent: Monday, May 17, 2010 1:47 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Additional disk space required for Hbase compactions..
>> Hello List,
>> On 17/05/10 20:26, Jonathan Gray wrote:
>>>    Same with major compactions (you would definitely need to turn them
>> off and control them manually if you need them at all).
>> How would you turn the major compaction off?
>> The only major compaction related parameter is this one:
>> <property>
>> <name>hbase.hregion.majorcompaction</name>
>> <value>86400000</value>
>> <description>The time (in miliseconds) between 'major' compactions of
>> all
>>       HStoreFiles in a region.  Default: 1 day.
>> </description>
>> </property>
>> Is there a cleaner way to turn it off than putting a ridiculously large
>> value?
>> Thanks
>> TuX

View raw message