hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: When does compaction actually occur?
Date Sun, 03 Jun 2012 08:57:28 GMT
What Amandeep says and also keep in mind that with the current selection process HBase holds
O(log N) files for N data. So say for 2GB region sizes you get 2-3 files. This means it very
"aggressively" is compacting files, and most of these are "all files included" once... which
are the promoted to major compactions implicitly. That way your predicate deletes should be
in effect and you will only need scheduled major compactions only ever so often.


On Jun 2, 2012, at 1:04 AM, Amandeep Khurana wrote:

> Tom,  
> Old cells will get deleted as a part of the next major compaction, which is typically
recommended to be done once a day, when the load on the system is at its lowest.
> FWIW… To have a TTL of 3600 take effect, you'll have to do a major compaction every
hour, which is an expensive operation specially at scale. Chances are that your I/O loads
will shoot up and latencies will spike for operations to HBase. Can you tell us why a TTL
of 3600s is of interest? What are your access patterns?
> -Amandeep
> On Friday, June 1, 2012 at 3:59 PM, Tom Brown wrote:
>> I have a table that holds rotating data. It has a TTL of 3600. For
>> some reason, when I scan the table I still get old cells that are much
>> older than that TTL.
>> I have tried issuing a compaction request via the web UI, but that
>> didn't seem to do anything.
>> Am I misunderstanding the data model used by HBase? Is there anything
>> else I can check to verify the functionality of my integration?
>> I am using HBase 0.92 with Hadoop 1.0.2.
>> Thanks in advance!
>> --Tom  

View raw message