Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99E9DCE58 for ; Wed, 19 Jun 2013 12:47:03 +0000 (UTC) Received: (qmail 98801 invoked by uid 500); 19 Jun 2013 12:47:02 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 98562 invoked by uid 500); 19 Jun 2013 12:47:02 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 98554 invoked by uid 99); 19 Jun 2013 12:47:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jun 2013 12:47:02 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jun 2013 12:46:56 +0000 Received: by mail-vb0-f47.google.com with SMTP id x14so3625076vbb.20 for ; Wed, 19 Jun 2013 05:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=HNDROUUzwNNNi6A2gtZiAw8RHc4Y2Wi7W4U+dd43xGg=; b=GTIshDbDg2BvbAtawZ33JxlO2zWylWDxPHLHOyj3sYhGYjQx77cWH6tMxN//MUTvn6 0vjWavlDj5va62j+D5EgRBrMohQbY1wPVS66qcYZ9UUygKrJwHd1dXa3QOv9DW14FnDg H2DVZfexqTXLpsMMuH85LV/zw2Qh2mk9PtHSMiw0GToJ2o+aE2oMQRpxtiBLR0jiZBSe O4nO6y7AxtW6V3CptJ/9DOmRIvZZYIu8IYXUAvRWZsqytM/QcG3qs2XTAxuFOPy80OAT UbRvNd9XGPJO5Hzq9Inrn6ca8SOqKePQoAgV4RsjxwqxNiqSaoZUiFN8EiMLHqormUCR ENgg== MIME-Version: 1.0 X-Received: by 10.220.72.7 with SMTP id k7mr565421vcj.94.1371645995944; Wed, 19 Jun 2013 05:46:35 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.220.177.68 with HTTP; Wed, 19 Jun 2013 05:46:35 -0700 (PDT) In-Reply-To: References: Date: Wed, 19 Jun 2013 14:46:35 +0200 X-Google-Sender-Auth: zLGiqqRGX7fqYamufo8GN45HSho Message-ID: Subject: Re: Efficiently wiping out random data? From: Jean-Daniel Cryans To: "dev@hbase.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org That sounds like a very effective way for developers to kill clusters with compactions :) J-D On Wed, Jun 19, 2013 at 2:39 PM, Kevin O'dell wrote: > JD, > > What about adding a flag for the delete, something like -full or > -true(it is early). Once we issue the delete to the proper row/region we > run a flush, then execute a single region major compaction. That way, if > it is a single record, or a subset of data the impact is minimal. If the > delete happens to hit every region we will compact every region(not ideal). > Another thought would be an overwrite, but with versions this logic > becomes more complicated. > > > On Wed, Jun 19, 2013 at 8:31 AM, Jean-Daniel Cryans wrote: > >> Hey devs, >> >> I was presenting at GOTO Amsterdam yesterday and I got a question >> about a scenario that I've never thought about before. I'm wondering >> what others think. >> >> How do you efficiently wipe out random data in HBase? >> >> For example, you have a website and a user asks you to close their >> account and get rid of the data. >> >> Would you say "sure can do, lemme just issue a couple of Deletes!" and >> call it a day? What if you really have to delete the data, not just >> mask it, because of contractual obligations or local laws? >> >> Major compacting is the obvious solution but it seems really >> inefficient. Let's say you've got some truly random data to delete and >> it happens so that you have at least one row per region to get rid >> of... then you need to basically rewrite the whole table? >> >> My answer was such, and I told the attendee that it's not an easy use >> case to manage in HBase. >> >> Thoughts? >> >> J-D >> > > > > -- > Kevin O'Dell > Systems Engineer, Cloudera