hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
Date Thu, 31 May 2012 19:29:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286861#comment-13286861
] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I've recently had an existential crisis, of sorts, over snapshots. Triggered by both Jon's
questions and some from Ian Varley, I've started to rethink the goal of snapshot. Initially,
it was to take a globally consistent view of a single table. The question that Ian raised
is, "Why are we enforcing stricter guarantees for a snapshot than for a scan?" In fact, a
globally consistent view is something HBase explicitly doesn't support (if you do a put to
two different tables, you have no real, system level guarantees of consistency). 

So does it really matter if we have an actual point in time? Everything in HBase is timestamped,
which is considered the source of truth for a given Mutation. If we are doing a scan for the
state of the table as of 12:15:05, we don't know if RS1 is 2 seconds before RS2 - as far as
we care, its just the state at 12:15:05. 
 
This starts to break down a little bit when doing a Get for the latest version on a table.
If RS1 is two seconds behind RS2 and we snapshot at 12:15:05, then we actually might not see
all the change to RS1 in the snapshot. However, this doesn't really matter because you still
wouldn't see that edit when looking at that "time". Things are happening so fast in HBase
that the best we really need is just a "fuzzy" view of the state of the table.

The upside to this is we can do the snapshot _without taking any downtime_ on the table being
snapshotted. I already discussed how to do this generally in the document, but it will have
to be rewritten from the perspective of timestamped based snapshots (I'll move it to a google
doc until we get a more finalized version).

The only problem that has jumped out in multiple discussions of the timestamp based approach
is that if you are using the timestamp for something other than the time (ala Facebook Messages)
you might not be able to make use of snapshots. At Salesforce, I was planning on abusing timestamps
as well, so that consideration will be made in the implementation (I'll go over how in another
post).

TL;DR global consistency doesn't matter for HBase since the timestamp is the source of truth
- the only question is whether you believe the timestamp or not. I would posit that based
on the design of HBase it has to be considered a source of truth.

I'll respond in a bit with a more detailed design of how timestamp based snapshots differ
from the point-in-time design, but in everything except how to deal with the memstore and
WAL, it _exactly the same_. The way to handle the memstore was suggested by Ian Varley in
that we basically use the memstore snapshot stuff with some rejiggering to wait a certain
amount of time; for the WAL we can just use the meta edits that Jon recommends and that I've
at least talked about IRL (if not in text).
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically
changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message