hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Di Spaltro <dan.dispal...@gmail.com>
Subject Re: Configuring tombstone purge independent of deleted cell purge
Date Tue, 23 Sep 2014 16:54:34 GMT
Simple question, did you copy and paste that snippet since it has two name
stanzas.

On Tue, Sep 23, 2014 at 9:42 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi James,
>
> Is it possible that you are impacted by
> https://issues.apache.org/jira/browse/HBASE-10118 ? Any change to test
> with
> one release where HBASE-10118 is available?
>
> JM
>
> 2014-09-23 12:10 GMT-04:00 James Estes <james.estes@gmail.com>:
>
> > It does sound like what I'd want (that's why I was trying to use it :) ),
> > but it isn't working as described. Maybe it is a bug?
> >
> > The behavior I'm seeing is that the delete markers are removed on major
> > compaction, regardless of having a hbase.hstore.time.to.purge.deletes set
> > in hbase-site.xml:
> > https://gist.github.com/housejester/2b8fbba0d05c6abbe784
> >
> > I think I've found the issue now. You mentioned the setting could be
> > applied per CF...so I tested that way, and it works as expected. My
> > hbase-site.xml had:
> >
> > <property>
> >   <name>hbase.hstore.time.to.purge.deletes</name>
> >   <name>600000</name>
> > </property>
> >
> > But that doesn't seem to be applied (even with restarts, etc). Setting
> > hbase.hstore.time.to.purge.deletes directly on the column family does
> work
> > though:
> > https://gist.github.com/housejester/a81274bf74a8666fba32
> >
> > Not sure why it isn't picking up from my hbase-site.xml, but I'll just
> > configure it on the CFs. This is on hbase-0.98.6.1-hadoop2 and
> > hbase-0.96.0-hadoop2 running in local mode.
> >
> > Thanks Lars,
> > James
> >
> > On Mon, Sep 22, 2014 at 11:04 PM, lars hofhansl <larsh@apache.org>
> wrote:
> >
> > > You can use the hbase.hstore.time.to.purge.deletes config option.
> > > You can set it globally or per Column Family.
> > >
> > > This is the description in hbase-default.xml:
> > >   <property>
> > >     <name>hbase.hstore.time.to.purge.deletes</name>
> > >     <value>0</value>
> > >     <description>The amount of time to delay purging of delete markers
> > > with future timestamps. If
> > >       unset, or set to 0, all delete markers, including those with
> future
> > > timestamps, are purged
> > >       during the next major compaction. Otherwise, a delete marker is
> > kept
> > > until the major compaction
> > >       which occurs after the marker's timestamp plus the value of this
> > > setting, in milliseconds.
> > >     </description>
> > >   </property>
> > >
> > > That seems to be exactly what you want.
> > >
> > > -- Lars
> > >
> > >
> > > ----- Original Message -----
> > > From: James Estes <james.estes@gmail.com>
> > > To: user@hbase.apache.org
> > > Cc:
> > > Sent: Monday, September 22, 2014 10:39 AM
> > > Subject: Configuring tombstone purge independent of deleted cell purge
> > >
> > > Could tombstone purges be independent of purging deleted cells and
> > > KEEP_DELETED_CELLS setting? In my use case, I do not want to keep
> deleted
> > > cells, but I do need to keep the tombstones around. Without the
> > tombstones,
> > > I'm not able to do incremental backups (custom, we do timerange raw
> scans
> > > ourselves for this).
> > >
> > > As a rough example, if I have the following timeline for the same row
> key
> > > (where t# is time):
> > > t0 - full backup (using a time range up to t0)
> > > t1 - PUT v1
> > > t2 - incremental backup #1 (time range t0 up to t2)
> > > t3 - DELETE
> > > t4 - flush and major compaction happens
> > > t5 - incremental backup #2 (time range t2 up to t5)
> > > t6 - full system crash
> > > t7 - data restored from full backup + incrementals #1 and #2
> > >
> > > When the restore completes, the row will have been un-deleted. This is
> > > because the incremental backup in #2 will not have the tombstone, since
> > it
> > > gets compacted out.
> > >
> > > So in our case, I do NOT want to keep deleted cells (because I do not
> > want
> > > the cells to show up in time range scans users may do), but I DO want
> to
> > > keep the tombstones for a configurable amount of time (much larger than
> > our
> > > planned incremental backup schedule) so they are captured during
> backup.
> > > This would allow for the custom incremental backups to be independent
> of
> > > major compactions. Without it, the backup schedule would have to
> manually
> > > handle compactions and would always have to do a FULL Backup after a
> > major
> > > compaction (otherwise there can be loss because when any major
> compaction
> > > happens, any tombstone that came in after the last incremental will be
> > > lost).
> > >
> > > It seems like there could be another setting for when to purge
> > tombstones.
> > > Currently, there is hbase.hstore.time.to.purge.deletes for when to
> purge
> > > deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which
> makes
> > > sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones
> that
> > > could default to the same value as hbase.hstore.time.to.purge.deletes,
> > but
> > > would take effect regardless of the KEEP_DELETED_CELLS setting. It
> should
> > > have a constraint so that hbase.hstore.time.to.purge.deletes <
> > > hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones
> > > disappearing before the deleted cells).
> > >
> > > Does this seem reasonable? Is there another approach I might take?
> > >
> > > Thanks,
> > >
> > >
> >
>



-- 
Dan Di Spaltro

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message