hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Additional disk space required for Hbase compactions..
Date Tue, 18 May 2010 17:27:42 GMT
The equivalent of HBase minor compactions would be Bigtable's merging
compaction (minus the part where it also reads from memtable).

About your space problem, the recommended practice is to keep your
system with at least 20% free disk space else you can run into all
sorts of problems.

J-D

On Tue, May 18, 2010 at 4:06 AM, TuX RaceR <tuxracer69@gmail.com> wrote:
> Thank you Jonathan for raising the Jira and attaching a patch
>
> I was looking for more info on how major compactions and minor compactions
> work and google found me this page:
>
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
>
> After reading the wiki page and Google Bigtable paper, it seems to me that
> there is a difference between Google 'minor compactions' andHbase 'minor
> compactions'.
>
> In google, a minor compaction is (from the paper):
> "5.4 Compactions
> As write operations execute, the size of the memtable increases. When the
> memtable size reaches a threshold, the memtable is frozen, a new memtable is
> created, and the frozen memtable is converted to an SSTable and written to
> GFS. This minor compaction process has two goals:
> it shrinks the memory usage of the tablet server, and it reduces the amount
> of data that has to be read from the commit log during recovery if this
> server dies. Incoming read and write operations can continue while
> compactions occur.
> Every minor compaction creates a new SSTable. If this behavior continued
> unchecked, read operations might need to merge updates from an arbitrary
> number of SSTables."
>
> On the other hand the Hbase wiki:
> "Compactions: When the number of MapFiles exceeds a configurable threshold,
> a minor compaction is performed which consolidates the most recently written
> MapFiles."
>
> So it seems that:
> 1) google minor compactions are equivalent to Hbase cache flushes
> 2) google major compactions are equivalent to Hbase major compactions
> 3) there is no equivalent of Hbase minor compactions in the google design.
>
> can somebody confirm this?
> As in my case my data is almost immutable (i.e I do not have a lot of space
> to claim for deleted rows as there are few of them) , I am wondering if the
> compactions do not more harm than good.
>
> Thanks
> TuX
>
>
>
> On 17/05/10 23:12, Jonathan Gray wrote:
>>
>> No there isn't.
>>
>> I just opened a JIRA to make it so it can be set to 0 to disable.
>>
>> https://issues.apache.org/jira/browse/HBASE-2559
>>
>> Will put up a patch for trunk/0.21.
>>
>> JG
>>
>>
>>>
>>> -----Original Message-----
>>> From: TuX RaceR [mailto:tuxracer69@gmail.com]
>>> Sent: Monday, May 17, 2010 1:47 PM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: Re: Additional disk space required for Hbase compactions..
>>>
>>> Hello List,
>>>
>>>
>>> On 17/05/10 20:26, Jonathan Gray wrote:
>>>
>>>>
>>>>   Same with major compactions (you would definitely need to turn them
>>>>
>>>
>>> off and control them manually if you need them at all).
>>>
>>>>
>>>>
>>>
>>> How would you turn the major compaction off?
>>> The only major compaction related parameter is this one:
>>>
>>> <property>
>>> <name>hbase.hregion.majorcompaction</name>
>>> <value>86400000</value>
>>> <description>The time (in miliseconds) between 'major' compactions of
>>> all
>>>      HStoreFiles in a region.  Default: 1 day.
>>> </description>
>>> </property>
>>>
>>> Is there a cleaner way to turn it off than putting a ridiculously large
>>> value?
>>>
>>> Thanks
>>> TuX
>>>
>
>

Mime
View raw message