hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2376) Add special SnapshotScanner which presents view of all data at some time in the past
Date Wed, 21 Nov 2012 07:43:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501743#comment-13501743
] 

Kannan Muthukkaruppan commented on HBASE-2376:
----------------------------------------------

Lars wrote: <<<Flashback queries only makes sense with TTL>>>. This is not
true. A simple CF with VERSIONS=1 & no TTL (i.e. TTL of infinity) can also benefit from
ability to FlashBack query. Flash back is simply an ability to query the DB as of a previous
point in time. Why should we overload that functionality with versions, TTL, etc.?

I think it is useful to think of FlashBack as completely independent of other settings like
TTL, MAXVERSIONS, MINVERSIONS, etc. The latter should be picked at schema design time based
on the application requirements. For example, you may have many tables in your system with
different TTL, VERSIONS requirements. Maybe you have different CFs within a table, with differing
TTL & VERSION requirements. 

But on top of all those, suppose across all my tables I want to be able to query the entire
DB as of a previous point in time. From a user's point of view, the only setting they need
to worry about is the "time period" (back in time) up to which flash back queries are supported.

For example, you might have one CF, with VERSIONS=1, where you are keeping hourly rollup data
that you want to retain for 1 month (TTL) and, another CF where you keep daily rollup data
also with VERSIONS=1 where you want to retain data for 3 years. But separately, I want the
ability to be able to do flash back queries up to say 7 days back. This "7 days" should be
a completely different setting, and there seems to be no reason to confuse this with TTL &
Verions.

Now, API wise, we need the ability to say that we are doing a flashback query i.e. "Scan @
T" instead of regular "Scan". In Oracle DB too, for instance, flash back queries have this
special syntax:

SELECT * FROM employee 
  AS OF TIMESTAMP <TS>
  WHERE name = 'JOHN';

Regarding <<< So the snapshot scanner is special in that only through this specific
scanner you can look further back than the TTL.>>>: I think that is by design. Note:
Scan @ T (flash back query) is different than doing a Scan with setTimeRange(0, T). A delete
done a T+1 of a key is immaterial for Scan @ T query; whereas for a Scan with setTimeRange(0,
T), you will still see the effect of the delete done at T+1. 

----

In summary, we should not confuse our users by forcing them to change their schema design
(i.e. choice of VERSIONS, TTL, etc.) to support flashback queries. Flashback support should
be configured using a simple extra knob that can be set a system, table or CF level. We should
NOT overload that knob with TTL and Versions.

----


 
                
> Add special SnapshotScanner which presents view of all data at some time in the past
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2376
>                 URL: https://issues.apache.org/jira/browse/HBASE-2376
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, regionserver
>    Affects Versions: 0.20.3
>            Reporter: Jonathan Gray
>            Assignee: Pritam Damania
>
> In order to support a particular kind of database "snapshot" feature which doesn't require
copying data, we came up with the idea for a special SnapshotScanner that would present a
view of your data at some point in the past.  The primary use case for this would be to be
able to recover particular data/rows (but not all data, like a global rollback) should they
have somehow been messed up (application fault, application bug, user error, etc.).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message