hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Estes <james.es...@gmail.com>
Subject Re: Configuring tombstone purge independent of deleted cell purge
Date Tue, 23 Sep 2014 16:57:07 GMT
Hah. Indeed it does. Thanks for the help.

James

On Sep 23, 2014, at 10:54 AM, Dan Di Spaltro <dan.dispaltro@gmail.com> wrote:

> Simple question, did you copy and paste that snippet since it has two name
> stanzas.
> 
> On Tue, Sep 23, 2014 at 9:42 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
> 
>> Hi James,
>> 
>> Is it possible that you are impacted by
>> https://issues.apache.org/jira/browse/HBASE-10118 ? Any change to test
>> with
>> one release where HBASE-10118 is available?
>> 
>> JM
>> 
>> 2014-09-23 12:10 GMT-04:00 James Estes <james.estes@gmail.com>:
>> 
>>> It does sound like what I'd want (that's why I was trying to use it :) ),
>>> but it isn't working as described. Maybe it is a bug?
>>> 
>>> The behavior I'm seeing is that the delete markers are removed on major
>>> compaction, regardless of having a hbase.hstore.time.to.purge.deletes set
>>> in hbase-site.xml:
>>> https://gist.github.com/housejester/2b8fbba0d05c6abbe784
>>> 
>>> I think I've found the issue now. You mentioned the setting could be
>>> applied per CF...so I tested that way, and it works as expected. My
>>> hbase-site.xml had:
>>> 
>>> <property>
>>>  <name>hbase.hstore.time.to.purge.deletes</name>
>>>  <name>600000</name>
>>> </property>
>>> 
>>> But that doesn't seem to be applied (even with restarts, etc). Setting
>>> hbase.hstore.time.to.purge.deletes directly on the column family does
>> work
>>> though:
>>> https://gist.github.com/housejester/a81274bf74a8666fba32
>>> 
>>> Not sure why it isn't picking up from my hbase-site.xml, but I'll just
>>> configure it on the CFs. This is on hbase-0.98.6.1-hadoop2 and
>>> hbase-0.96.0-hadoop2 running in local mode.
>>> 
>>> Thanks Lars,
>>> James
>>> 
>>> On Mon, Sep 22, 2014 at 11:04 PM, lars hofhansl <larsh@apache.org>
>> wrote:
>>> 
>>>> You can use the hbase.hstore.time.to.purge.deletes config option.
>>>> You can set it globally or per Column Family.
>>>> 
>>>> This is the description in hbase-default.xml:
>>>>  <property>
>>>>    <name>hbase.hstore.time.to.purge.deletes</name>
>>>>    <value>0</value>
>>>>    <description>The amount of time to delay purging of delete markers
>>>> with future timestamps. If
>>>>      unset, or set to 0, all delete markers, including those with
>> future
>>>> timestamps, are purged
>>>>      during the next major compaction. Otherwise, a delete marker is
>>> kept
>>>> until the major compaction
>>>>      which occurs after the marker's timestamp plus the value of this
>>>> setting, in milliseconds.
>>>>    </description>
>>>>  </property>
>>>> 
>>>> That seems to be exactly what you want.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> From: James Estes <james.estes@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Cc:
>>>> Sent: Monday, September 22, 2014 10:39 AM
>>>> Subject: Configuring tombstone purge independent of deleted cell purge
>>>> 
>>>> Could tombstone purges be independent of purging deleted cells and
>>>> KEEP_DELETED_CELLS setting? In my use case, I do not want to keep
>> deleted
>>>> cells, but I do need to keep the tombstones around. Without the
>>> tombstones,
>>>> I'm not able to do incremental backups (custom, we do timerange raw
>> scans
>>>> ourselves for this).
>>>> 
>>>> As a rough example, if I have the following timeline for the same row
>> key
>>>> (where t# is time):
>>>> t0 - full backup (using a time range up to t0)
>>>> t1 - PUT v1
>>>> t2 - incremental backup #1 (time range t0 up to t2)
>>>> t3 - DELETE
>>>> t4 - flush and major compaction happens
>>>> t5 - incremental backup #2 (time range t2 up to t5)
>>>> t6 - full system crash
>>>> t7 - data restored from full backup + incrementals #1 and #2
>>>> 
>>>> When the restore completes, the row will have been un-deleted. This is
>>>> because the incremental backup in #2 will not have the tombstone, since
>>> it
>>>> gets compacted out.
>>>> 
>>>> So in our case, I do NOT want to keep deleted cells (because I do not
>>> want
>>>> the cells to show up in time range scans users may do), but I DO want
>> to
>>>> keep the tombstones for a configurable amount of time (much larger than
>>> our
>>>> planned incremental backup schedule) so they are captured during
>> backup.
>>>> This would allow for the custom incremental backups to be independent
>> of
>>>> major compactions. Without it, the backup schedule would have to
>> manually
>>>> handle compactions and would always have to do a FULL Backup after a
>>> major
>>>> compaction (otherwise there can be loss because when any major
>> compaction
>>>> happens, any tombstone that came in after the last incremental will be
>>>> lost).
>>>> 
>>>> It seems like there could be another setting for when to purge
>>> tombstones.
>>>> Currently, there is hbase.hstore.time.to.purge.deletes for when to
>> purge
>>>> deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which
>> makes
>>>> sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones
>> that
>>>> could default to the same value as hbase.hstore.time.to.purge.deletes,
>>> but
>>>> would take effect regardless of the KEEP_DELETED_CELLS setting. It
>> should
>>>> have a constraint so that hbase.hstore.time.to.purge.deletes <
>>>> hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones
>>>> disappearing before the deleted cells).
>>>> 
>>>> Does this seem reasonable? Is there another approach I might take?
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Dan Di Spaltro


Mime
View raw message