hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
Date Sat, 01 Nov 2014 08:49:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193026#comment-14193026

Hadoop QA commented on HBASE-12363:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .
  ATTACHMENT ID: 12678670

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 27 new or modified

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

                {color:red}-1 checkstyle{color}.  The applied patch generated 3784 checkstyle
errors (more than the trunk's current 3781 errors).

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 1 release audit warnings
(more than the trunk's current 0 warnings).

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than
    +    return setValue(KEEP_DELETED_CELLS, (keepDeletedCells ? KeepDeletedCells.TRUE : KeepDeletedCells.FALSE).toString());
+    this.keepDeletedCells = scan.isRaw() ? KeepDeletedCells.TRUE : isUserScan ? KeepDeletedCells.FALSE
: scanInfo.getKeepDeletedCells();
+    this.seePastDeleteMarkers = scanInfo.getKeepDeletedCells() != KeepDeletedCells.FALSE
&& isUserScan;
+    ScanInfo scanInfo = new ScanInfo(null, 0, 1, HConstants.LATEST_TIMESTAMP, KeepDeletedCells.FALSE,
+      family.setKeepDeletedCells(org.apache.hadoop.hbase.KeepDeletedCells.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS).to_s.upcase))
if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS)

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/patchReleaseAuditWarnings.txt
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/checkstyle-aggregate.html

                Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//console

This message is automatically generated.

> KEEP_DELETED_CELLS considered harmful?
> --------------------------------------
>                 Key: HBASE-12363
>                 URL: https://issues.apache.org/jira/browse/HBASE-12363
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>              Labels: Phoenix
>         Attachments: 12363-master.txt, 12363-test.txt
> Brainstorming...
> This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS
is implemented.
> The problem is around knowing when it is safe to remove a delete marker (we cannot remove
it unless all cells affected by it are remove otherwise).
> This was particularly hard for family marker, since they sort before all cells of a row,
and hence scanning forward through an HFile you cannot know whether the family markers are
still needed until at least the entire row is scanned.
> My solution was to keep the TS of the oldest put in any given HFile, and only remove
delete markers older than that TS.
> That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and
then never update it again. Then later you write a billion other rows and delete them all.
Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion
rows, these will never be collected... At least for the region that hosts ROW 1 after a major
> Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep
them until they would be removed by some other means (for example TTL, or MAX_VERSION when
new versions are inserted).
> The specific problem here is that even as all KVs affected by a delete marker are expired
this way the marker would not be removed if there just one older KV in the HStore.
> I don't see a good way out of this. In parent I outlined these four solutions:
> So there are three options I think:
> # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted
rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS
would still be enforced on all rows types except for family delete markers.
> # Translate family delete markers to column delete marker at (major) compaction time.
> # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to
the file metadata. Use that use expire delete marker that are older and hence can't affect
any puts in the file.
> # Have Store.java keep track of the earliest put in internalFlushCache and compactStore
and then append it to the file metadata. That way HFileWriterV* would not need to know about
> And I implemented #4.
> I'd love to get input on ideas.

This message was sent by Atlassian JIRA

View raw message