hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Swift <davidswiftm...@charter.net>
Subject Re: Delete Range Of Rows In HBase or How To Age Out Old Data
Date Mon, 29 Mar 2010 21:36:28 GMT


The TimeToLive works exactly as you described.  It's perfect for our needs.

However, I aged out several hundred thousand rows, waited about 10 minutes,
and then ran a compact from the HBase shell.  During the whole period, I ran
a periodic du command on the Hadoop data directory.  After a few minutes
after the 2nd or 3rd compact request, my disk usage actually went up by 40
blocks and remained that way for an hour.  Perhaps this is reasonable and by
design, but I'm curious if there's a page somewhere describing the
relationship between minor and major compactions and their impact on actual
local file system disk usage by Hadoop.


Andrew Purtell-2 wrote:
> Hi David,
> What about setting time to lives on column families? You can add or change
> the 'TTL' attribute on a column family in the shell, or specify a time to
> live when creating a table. See javadoc for HColumnDescriptor. A time to
> live is a Long value (unit is microseconds) associated with the column
> family. When a value's timestamp + ttl > current, then the value will no
> longer be returned in results for gets and scans, and will be garbage
> collected upon the next major compaction. 
> In a past project I used TTLs to age out content retrieved as part of web
> crawling after 30 days, and also to age out various metadata over shorter
> time frames depending on the type of information. In fact I contributed
> the TTL feature to enable this use case. 
> Hope that helps,
>    - Andy
>> From: David Swift
>> Subject: Delete Range Of Rows In HBase or How To Age Out Old Data
>> We're evaluating HBase and we have a case where we would
>> want to drop on the order of about 3 billion of the oldest records
>> out of about 500 billion at once.  We would take  measures to ensure
>> that there would be no new inserts into that old age range during
>> the deletion. We would know the low and the high row IDs in this
>> scenario.

View this message in context: http://old.nabble.com/Delete-Range-Of-Rows-In-HBase-or-How-To-Age-Out-Old-Data-tp28073228p28075513.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message