Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37A279A88 for ; Sat, 15 Oct 2011 04:00:41 +0000 (UTC) Received: (qmail 76307 invoked by uid 500); 15 Oct 2011 04:00:41 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 76133 invoked by uid 500); 15 Oct 2011 04:00:39 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 75734 invoked by uid 99); 15 Oct 2011 04:00:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Oct 2011 04:00:36 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Oct 2011 04:00:34 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E7AB43080B3 for ; Sat, 15 Oct 2011 04:00:13 +0000 (UTC) Date: Sat, 15 Oct 2011 04:00:13 +0000 (UTC) From: "jiraposter@reviews.apache.org (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <759252983.16581.1318651213950.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <77903496.5372.1317683553967.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128054#comment-13128054 ] jiraposter@reviews.apache.org commented on HBASE-4536: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2178/#review2609 ----------------------------------------------------------- Great stuff. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java Javadoc what the new param means http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java So you made this change to make it so user doesn't have condition where they are scratching their head trying to figure why something is not being let go though TTL is four hours? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java Is this value read often? If so you might want to cache it. Profiling these things can sometimes show up as costing. Not important but if you get to make new patch do if (test) block or if (test) { block } rather than the if (test) block you have. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java Let this be the convention from here on out. Maybe even two leading underscores and no trailing underscore... but whatever, let this be the convention for system configs in attribute maps going forward. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java Good. There is a real old feature request in hbase that asked for this.. let me find the issue.... I can't find it.... user wanted to see deletes too in a scan. Then with Benoit a week or so back we had to look in hfile because we didn't trust a delete.... we thought there stuff behind it. We could have used this facility to show all that was in hbase (if we'd enabled this new feature of course). I think this facility should be on by default. Whats the downsides? We don't let go of stuff if < maxversions? How you going to do versions now? If I ask for a scan beyond a delete will I see maxversions only because thats all we keep between deletes? Will it be the case that I will end up having: maxversions... delete marker .... maxversions ... delete marker .... maxversions... and so on if this facility is enabled? I won't have: as many versions as was written but we only return maxversions......delete marker...... as many versions as were written...... delete marker...... Is that right? (If you follow me). Say 'garbage collected' in the comment rather than just 'collected' http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java Bad name for a param. This is name you'd give the getter for a param named 'delete'. What is this param doing? Is this the raw attribute or is it from HCD saying keep deletes? Make it clear in javadoc. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java good http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java good http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java This stuff needs to make it out into the hbase book. Good stuff. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java So when I am doing a check for max versions = 3 and I have for a single cell: put ... put....delete....put If I keep delete markers and do a raw scan, I'll get: p p d If I have keep delete markers but not a raw scan I get: p p If I do not have keep delete markers and its not an ordinary scan: p p Something like that? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java I'm not sure I get what this one is about. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java An 'of' should be an 'if' here? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java These params on an SQM are starting to build up. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java I don't understand whats up here. Why we changing timerange if seePastDeleteMarker is set? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java Is 'keepDeletedRows' a bad name for this variable? Should it be keepDeletedCells? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java So, this is a delete kv at this stage of the processing? So, if its older than any put in any storefile on major compactions, we can let it go since has no effect? Thats good. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java This makes me nervous. All old tests pass on this code? You have a unit test to test that we are doing the right thing in here both for the old and the new handling? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java Again, bad name for param. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java Is this right? We should count it as a version if keepdeletes is on? Unless this flag is saying 'keepdeletes' and not 'this is a delete kv' as I'm interpreting it). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java No biggie... previous formatting was better. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java This belongs in HCD rather than up here in this global HConstants? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Why we move this to the end? http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb Very nice. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java Good test http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java White space - Michael On 2011-10-12 04:55:53, Lars Hofhansl wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2178/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-10-12 04:55:53) bq. bq. bq. Review request for hbase, Ted Yu and Jonathan Gray. bq. bq. bq. Summary bq. ------- bq. bq. HBase timerange Gets and Scans allow to do "timetravel" in HBase. I.e. look at the state of the data at any point in the past, provided the data is still around. bq. This did not work for deletes, however. Deletes would always mask all puts in the past. bq. This change adds a flag that can be on HColumnDescriptor to enable retention of deleted rows. bq. These rows are still subject to TTL and/or VERSIONS. bq. bq. This changes the following: bq. 1. There is a new flag on HColumnDescriptor enabling that behavior. bq. 2. Allow gets/scans with a timerange to retrieve rows hidden by a delete marker, if the timerange does not include the delete marker. bq. 3. Do not unconditionally collect all deleted rows during a compaction. bq. 4. Allow a "raw" Scan, which retrieves all delete markers and deleted rows. bq. bq. The change is small'ish, but the logic is intricate, so please review carefully. bq. bq. bq. This addresses bug HBASE-4536. bq. https://issues.apache.org/jira/browse/HBASE-4536 bq. bq. bq. Diffs bq. ----- bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1181616 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1181616 bq. bq. Diff: https://reviews.apache.org/r/2178/diff bq. bq. bq. Testing bq. ------- bq. bq. All tests pass now. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. > Allow CF to retain deleted rows > ------------------------------- > > Key: HBASE-4536 > URL: https://issues.apache.org/jira/browse/HBASE-4536 > Project: HBase > Issue Type: New Feature > Components: regionserver > Affects Versions: 0.92.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.94.0 > > > Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. > However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). > There should be a way to retain those version to guard against software error. > I see two options here: > 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED". > 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. > #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) > Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira