Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97398DB9A for ; Tue, 23 Oct 2012 11:42:33 +0000 (UTC) Received: (qmail 11568 invoked by uid 500); 23 Oct 2012 11:42:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 11508 invoked by uid 500); 23 Oct 2012 11:42:30 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 11499 invoked by uid 99); 23 Oct 2012 11:42:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 11:42:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.76 as permitted sender) Received: from [65.55.111.76] (HELO blu0-omc2-s1.blu0.hotmail.com) (65.55.111.76) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 11:42:21 +0000 Received: from BLU0-SMTP49 ([65.55.111.73]) by blu0-omc2-s1.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 23 Oct 2012 04:42:00 -0700 X-Originating-IP: [64.196.194.162] X-EIP: [gnuT2ooWea8ZeRDcXyMSyXxqLd7wAeqy] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [10.0.0.58] ([64.196.194.162]) by BLU0-SMTP49.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 23 Oct 2012 04:41:58 -0700 Content-Type: multipart/alternative; boundary="Apple-Mail=_2081BFE6-0E3F-4FD8-A490-BB19E7775064" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: How to config hbase0.94.2 to retain deleted data From: Michael Segel In-Reply-To: <1350969723.65858.YahooMailNeo@web121701.mail.ne1.yahoo.com> Date: Tue, 23 Oct 2012 06:41:52 -0500 CC: "user@hbase.apache.org" References: <1350865381.30565.YahooMailNeo@web121701.mail.ne1.yahoo.com> <1350969723.65858.YahooMailNeo@web121701.mail.ne1.yahoo.com> To: lars hofhansl X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 23 Oct 2012 11:41:58.0947 (UTC) FILETIME=[691AC730:01CDB113] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_2081BFE6-0E3F-4FD8-A490-BB19E7775064 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" "Deleted cells are still subject to TTL and there will never be more = than "maximum number of versions" deleted cells. A new "raw" scan = options returns all deleted rows and the delete markers. " This is different from the idea suggested by the OP. Here deleted cells = still get deleted. Just that when the compaction flag comes along, its = told to ignore them.=20 So if I say a column can have 3 versions (cells) then if I insert = another value for that row:column key, I push that deleted cell down the = stack. Enough times, its gone.=20 In theory, this feature would be useful if I wanted an OLTP = implementation on top of HBase. It would allow the transaction to bridge = a compaction cycle. However, that's pretty much it.=20 This feature doesn't translate well beyond this.=20 It also begs the following: How do I handle a long transaction (OLTP) = timeouts, and isolation levels?=20 If you look at this at the row level... definitely not a good idea. = Think of fat clogging an artery. =20 On Oct 23, 2012, at 12:22 AM, lars hofhansl wrote: > http://hbase.apache.org/book/cf.keep.deleted.html >=20 > Without it you cannot do correct as-of-time queries when it comes to = deletes. >=20 > -- Lars >=20 > From: Michael Segel > To: user@hbase.apache.org; lars hofhansl =20 > Sent: Monday, October 22, 2012 9:18 PM > Subject: Re: How to config hbase0.94.2 to retain deleted data >=20 > >=20 > > Curious, why do you think this is better than using the = keep-deleted-cells feature? > > (It might well be, just curious) >=20 > Ok... so what exactly does this feature mean?=20 >=20 > Suppose I have 500 rows within a region. I set this feature to be = true.=20 > I do a massive delete and there are only 50 rows left standing.=20 >=20 > So if I do a count of the number of rows in the region, I see only 50, = yet if I compact the table, its still full.=20 >=20 > Granted I'm talking about rows and not cells, but the idea is the = same. IMHO you're asking for more headaches that you solve.=20 >=20 > KISS would suggest that moving deleted data in to a different table = would yield better performance in the long run.=20 >=20 >=20 > On Oct 21, 2012, at 7:23 PM, lars hofhansl = wrote: >=20 > > That'd work too. Requires the regionservers to make remote updates = to other regionservers, though. And you have to trap each and every = change (Put, Delete, Increment, Append, RowMutations, etc) > >=20 > >=20 > > Curious, why do you think this is better than using the = keep-deleted-cells feature? > > (It might well be, just curious) > >=20 > >=20 > > -- Lars > >=20 > >=20 > >=20 > > ----- Original Message ----- > > From: Michael Segel > > To: user@hbase.apache.org > > Cc:=20 > > Sent: Sunday, October 21, 2012 4:34 PM > > Subject: Re: How to config hbase0.94.2 to retain deleted data > >=20 > > I would suggest that you use your coprocessor to copy the data to a = 'backup' table when you mark them for delete.=20 > > Then as major compaction hits, the rows are deleted from the main = table, but still reside undeleted in your delete table.=20 > > Call it a history table.=20 > >=20 > >=20 > > On Oct 21, 2012, at 3:53 PM, yun peng wrote: > >=20 > >> Hi, All, > >> I want to retain all deleted key-value pairs in hbase. I have tried = to > >> config HColumnDescript as follow to make it return deleted. > >>=20 > >> public void postOpen(ObserverContext = e) { > >> HTableDescriptor htd =3D = e.getEnvironment().getRegion().getTableDesc(); > >> HColumnDescriptor hcd =3D htd.getFamily(Bytes.toBytes("cf")); > >> hcd.setKeepDeletedCells(true); > >> hcd.setBlockCacheEnabled(false); > >> } > >>=20 > >> However, it does not work for me, as when I issued a delete and = then query > >> by an older timestamp, the old data does not show up. > >>=20 > >> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99 > >> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101 > >> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100 > >> hbase(main):122:0> get 'usertable', 'key1', {COLUMN =3D> 'cf:c1', = TIMESTAMP > >> =3D> 99, VERSIONS =3D> 4} > >> COLUMN CELL > >>=20 > >> 0 row(s) in 0.0040 seconds > >>=20 > >> hbase(main):123:0> get 'usertable', 'key1', {COLUMN =3D> 'cf:c1', = TIMESTAMP > >> =3D> 100, VERSIONS =3D> 4} > >> COLUMN CELL > >>=20 > >> 0 row(s) in 0.0050 seconds > >>=20 > >> hbase(main):124:0> get 'usertable', 'key1', {COLUMN =3D> 'cf:c1', = TIMESTAMP > >> =3D> 101, VERSIONS =3D> 4} > >> COLUMN CELL > >>=20 > >> cf:c1 timestamp=3D101, value=3Dv2 > >>=20 > >> 1 row(s) in 0.0050 seconds > >>=20 > >> Note this is a new feature in 0.94.2 > >> (HBASE-4536), > >> I did not find too many sample code online, so... any one here has > >> experience in using HBASE-4536. How should one config > >> hbase to enable this feature in hbase? > >>=20 > >> Thanks > >> Yun > >=20 >=20 >=20 >=20 --Apple-Mail=_2081BFE6-0E3F-4FD8-A490-BB19E7775064--