Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 84830 invoked from network); 21 Jan 2011 23:50:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Jan 2011 23:50:32 -0000 Received: (qmail 46541 invoked by uid 500); 21 Jan 2011 23:50:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 46455 invoked by uid 500); 21 Jan 2011 23:50:30 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 46447 invoked by uid 99); 21 Jan 2011 23:50:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 23:50:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of billgraham@gmail.com designates 74.125.82.41 as permitted sender) Received: from [74.125.82.41] (HELO mail-ww0-f41.google.com) (74.125.82.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 23:50:24 +0000 Received: by wwi18 with SMTP id 18so1307573wwi.2 for ; Fri, 21 Jan 2011 15:50:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=g8JPQZeT8hqNIULE58qoHY3zTE1vyZGCoBd9X527Xd0=; b=iACCAT9U89f25Q6130ecwzCDQaDZN4/JBMkWCAUB0YPpzN+a1k9X97KB1FMLDhfYEp SVzcizK3KpYdWtrRodgHQEaEUgb2e7lWj15O/3DlogLsRaRlDxpE34kLJnaFaFGEVDzq C3yX/DUlDvhmcFgZkoneLBkLenXWrkNbgz+0Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; b=IdICOKulYCiHyeN0u2S8/OY8A6VY9s/j14ysrmdM89MRv5HMX5aYJfZ7++7vSTgLaX QKiU7IDKtjki72eTw+igi9g7rZYeIJTsCWpNj+a/u2yF/eW8lkIY1CNqmXoT/SwQG4rx 23aHodtim7oYcaJ3kWCQK/Q8VzMMSe98ohTqc= Received: by 10.216.186.142 with SMTP id w14mr1459791wem.18.1295653803625; Fri, 21 Jan 2011 15:50:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.74.20 with HTTP; Fri, 21 Jan 2011 15:41:55 -0800 (PST) Reply-To: billgraham@gmail.com In-Reply-To: References: From: Bill Graham Date: Fri, 21 Jan 2011 15:41:55 -0800 Message-ID: Subject: Re: delete using server's timestamp To: Ryan Rawson Cc: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks Ryan, that clears it up. On Fri, Jan 21, 2011 at 3:29 PM, Ryan Rawson wrote: > No, the storage model does not work like that. =A0The storage model > revolves around the KeyValue, which is roughly: > > rowid/family/qualifier/timestamp/data > > and we store sequences of these in sorted order in HFiles. > > Note, we store the row with every single version of every column/cell. > > Therefore there is no such thing as "removing the bytes that represent > the actual row key", they are part of every cell, and once those cells > go away, then so does the row key. > > I hope this helps, > -ryan > > On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham wrote= : >> I follow the tombstone/compact/delete cycle of the column values, but >> I'm still unclear of the row key life cycle. >> >> Is it that the bytes that represent the actual row key are associated >> with and removed with each column value? Or are they removed upon >> compaction when no column values exist for a given row key? >> >> >> >> On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson wrote: >>> Any of the deletes merely insert a 'tombstone' which doesnt delete the >>> data immediately but does mark it so queries no longer return it. >>> >>> During the compactions we prune these delete values and they disappear >>> for good. =A0(Barring other backups of course) >>> >>> Because of our variable length storage model, we dont store rows in >>> particular blocks and rewrite said blocks, so notions of rows >>> 'existing' or not, don't event apply to HBase as they do to RDBMS >>> systems. >>> >>> -ryan >>> >>> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham wro= te: >>>> If you use some combination of delete requests and leave a row without >>>> any column data will the row/rowkey still exist? I'm thinking of the >>>> use case where you want to prune all old data, including row keys, >>>> from a table. >>>> >>>> >>>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson wrot= e: >>>>> There are 3 kinds of deletes (with a 4th for win): >>>>> >>>>> - Delete.deleteFamily(byte [] family, [long]) >>>>> -- This removes all data from the given family before the given >>>>> timestamp, or if none is given, System.currentTimeMillis() >>>>> - Delete.deleteColumns(byte[] family, byte[]qualifier, [long]) >>>>> -- This removes all data from the given qualifier, before the given >>>>> timestamp, or if none is given, System.currentTimeMillis() >>>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long]) >>>>> -- This removes A SINGLE VERSION at the given time, or if none is >>>>> given, the most recent version is Get'ed and deleted. >>>>> - Delete() >>>>> -- Calls deleteFamily() on server side on every family. >>>>> >>>>> Stack is talking about the LAST delete form. >>>>> >>>>> I think what you want is probably deleteColumns() (plural!), or >>>>> perhaps deleteFamily(). >>>>> >>>>> One rarely wants to call deleteColumn(), since it removes just a >>>>> single version, thus exposing older versions, which MAY be what you >>>>> want, but I'm guessing probably isn't. >>>>> >>>>> Only the last form (deleteColumn (singlar!)) calls Get, the rest do >>>>> not call Get and are very fast. >>>>> >>>>> -ryan >>>>> >>>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack wrote: >>>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan = wrote: >>>>>>> Is there a way to issue a delete using the server's current timesta= mp? =A0I >>>>>>> see methods using HConstants.LATEST_TIMESTAMP which is extremely ex= pensive >>>>>>> since it triggers a Get call. >>>>>> >>>>>> Yes. =A0Deleting latest version involves a Get to figure the most >>>>>> recents timestamp. =A0And yes, in src code it says this is 'expensiv= e'. >>>>>> Seems like it does this lookup anything LATEST_TIMESTAMP is passed >>>>>> whether column, columns, or family only to ensure the delete goes in >>>>>> ahead of whatever is currently in the Store. >>>>>> >>>>>> St.Ack >>>>>> >>>>> >>>> >>> >> >