hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8721) Deletes can mask puts that happen after the delete
Date Tue, 18 Jun 2013 16:04:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686862#comment-13686862

Feng Honghua commented on HBASE-8721:

Another benefit from behaviour "delete can't mask puts happened after it"(in essence mvcc
also participates in delete handling): the 'delete latest version'(deleteColumn() without
timestamp) can have better performance by removing the read operation in RS which is to get
the timestamp of the latest version and set to the delete.

Below is the update process for 'delete latest version' (under 'delete can't mask puts happened
after it' behaviour):
  1. deleteColumn() (without timestamp) issued by client, its timestamp is set to an 'invalid'
value (0/-1 is a good candidate) to indicate 'delete the latest version'. RS just puts this
Delete type kv as other type deletes without read operation.

  2. when Get/Scan, by timestamp=0/-1 we know this delete is to delete the latest version
and check the kv it sees. And we know the first kv with mvcc < 'mvcc of this delete' is
the 'latest' version when the delete enters RS. After delete(mask) this first kv (with mvcc
checked) this 'delete latest version' delete also need to be removed from the ScanDeleteTracker.

  That's all.

  Then why we can't achieve such light-weight(without read) 'delete latest version' delete?
The root cause is the 'delete can mask puts that happen after it' behaviour, which doesn't
use mvcc in delete handling.

  When issuing 'delete latest version'(deleteColumn() without timestamp), the real semantic
is 'to delete the latest one of all the currently EXISTING versions', the EXISTING means the
one happened BEFORE the delete enters RS, and BEFORE is a concept of operation happening order
(indicated by mvcc), which can't be represented by timestamp.

  Then why we can't handle 'delete latest version' without a read, as above process? Because
newer version can be put which has the bigger timestamp (later than the 'current' latest when
delete enters RS, by timestamp), and by behaviour 'delete can mask puts happened after delete'(its
essence is to determine whether a kv masked by a delete only by comparing their timestamps)
a 'delete latest version' delete can't tell whether the first version it sees is the latest
version when itself hit RS (in fact it can use mvcc to get this information, but it doesn't)

  Certainly we can use mvcc only for 'delete latest version' to get the (remarkable) performance
gain by removing the read operation, but it sounds inconsistent in that we handle deletes
internally in different ways (one use mvcc, other don't)
> Deletes can mask puts that happen after the delete
> --------------------------------------------------
>                 Key: HBASE-8721
>                 URL: https://issues.apache.org/jira/browse/HBASE-8721
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Feng Honghua
>         Attachments: HBASE-8721-0.94-V0.patch
> this fix aims for bug mentioned in http://hbase.apache.org/book.html
> "Deletes mask puts, even puts that happened after the delete was entered. Remember that
a delete writes a tombstone, which only disappears after then next major compaction has run.
Suppose you do a delete of everything <= T. After this you do a new put with a timestamp
<= T. This put, even if it happened after the delete, will be masked by the delete tombstone.
Performing the put will not fail, but when you do a get you will notice the put did have no
effect. It will start working again after the major compaction has run. These issues should
not be a problem if you use always-increasing versions for new puts to a row. But they can
occur even if you do not care about time: just do delete and put immediately after each other,
and there is some chance they happen within the same millisecond."

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message