Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D723211651 for ; Mon, 22 Sep 2014 17:40:15 +0000 (UTC) Received: (qmail 22444 invoked by uid 500); 22 Sep 2014 17:40:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 22386 invoked by uid 500); 22 Sep 2014 17:40:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 22369 invoked by uid 99); 22 Sep 2014 17:40:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2014 17:40:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of james.estes@gmail.com designates 209.85.192.169 as permitted sender) Received: from [209.85.192.169] (HELO mail-pd0-f169.google.com) (209.85.192.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2014 17:39:46 +0000 Received: by mail-pd0-f169.google.com with SMTP id fp1so774510pdb.28 for ; Mon, 22 Sep 2014 10:39:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=kIOeSMexSOkSmYpgTn/y/S3TowqQsZpRHsKQRQutVEs=; b=vDI5kZIzZXnCiCelDH5a2XQkx3kls04wnt9cjly2FdhIX9MrUR1mRkhH6KJBth0Jq0 IBrxaUOvaoOFn7qPIFnQZwPqNZ85G/V89+4ZxU6CvLKWtNV8xOJdpMZuqbq2PpLsS4OT Y+ibUj7FF5BMdKrUkuk8ynuPih/r8HLS2cI+nV2Mh9jJTuHB8anEHbBT60153pdKfjxf J96nZFgSWAtRtN/d5huYLpAPhrL5yqOA4Smm9cefh3eb3E0GBzSV1laa9hgFcqcbRa1+ P5tL4uvdD3rT5OLM0b2NpXCQyKnV5QgbLlekd2HcUehYQSYUJjuKzXWI0c4PMTpJs39N b4Lw== MIME-Version: 1.0 X-Received: by 10.70.135.137 with SMTP id ps9mr32716082pdb.13.1411407585152; Mon, 22 Sep 2014 10:39:45 -0700 (PDT) Received: by 10.66.16.134 with HTTP; Mon, 22 Sep 2014 10:39:45 -0700 (PDT) Date: Mon, 22 Sep 2014 11:39:45 -0600 Message-ID: Subject: Configuring tombstone purge independent of deleted cell purge From: James Estes To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1133426cbf14610503aaef14 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133426cbf14610503aaef14 Content-Type: text/plain; charset=UTF-8 Could tombstone purges be independent of purging deleted cells and KEEP_DELETED_CELLS setting? In my use case, I do not want to keep deleted cells, but I do need to keep the tombstones around. Without the tombstones, I'm not able to do incremental backups (custom, we do timerange raw scans ourselves for this). As a rough example, if I have the following timeline for the same row key (where t# is time): t0 - full backup (using a time range up to t0) t1 - PUT v1 t2 - incremental backup #1 (time range t0 up to t2) t3 - DELETE t4 - flush and major compaction happens t5 - incremental backup #2 (time range t2 up to t5) t6 - full system crash t7 - data restored from full backup + incrementals #1 and #2 When the restore completes, the row will have been un-deleted. This is because the incremental backup in #2 will not have the tombstone, since it gets compacted out. So in our case, I do NOT want to keep deleted cells (because I do not want the cells to show up in time range scans users may do), but I DO want to keep the tombstones for a configurable amount of time (much larger than our planned incremental backup schedule) so they are captured during backup. This would allow for the custom incremental backups to be independent of major compactions. Without it, the backup schedule would have to manually handle compactions and would always have to do a FULL Backup after a major compaction (otherwise there can be loss because when any major compaction happens, any tombstone that came in after the last incremental will be lost). It seems like there could be another setting for when to purge tombstones. Currently, there is hbase.hstore.time.to.purge.deletes for when to purge deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which makes sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones that could default to the same value as hbase.hstore.time.to.purge.deletes, but would take effect regardless of the KEEP_DELETED_CELLS setting. It should have a constraint so that hbase.hstore.time.to.purge.deletes < hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones disappearing before the deleted cells). Does this seem reasonable? Is there another approach I might take? Thanks, --001a1133426cbf14610503aaef14--