hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2893) Table metacolumns
Date Sun, 01 Aug 2010 15:19:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894387#action_12894387
] 

Jonathan Gray commented on HBASE-2893:
--------------------------------------

This sounds really interesting andy.  I'm a little concerned that this would be rather disruptive
to the code but used by a very small portion of users.

So the default behavior would be to always create the metacolumn family and the read path
will always have these checks in it?  Maybe this feature itself should be a table-level setting
and should try to get all the logic related to this into new classes with just a hook or two
into the existing read-time checks.

The current QueryMatcher/Tracker code paths are starting to get a little messy and I'm a little
worried about adding a bunch of new checks to every KV for this or any other feature (there's
some work going into some of the seek/reseek optimizations and it's hard to move it forward
because adding another couple row checks can be significant if done on every kv).

In addition, this would break with the pattern of each family able to be processed in isolation.
 Now, reading of each family will require an additional scanner against the metacolumn family.
 So, if reading from a 5 family table (+1 for meta), you'd end up reading the metacolumn 5
times, once for each user family?  Things like the bloom filter check would have to happen
during the read, so at a different level than it's currently done.

Would this check be first, last, or scattered throughout the read checks?  I would guess first
but not sure if there are other things desired besides TTL and ACLs that might require some
of the existing checks first.  I'm not quite sure I understand the TTL use case, this seems
like an extremely rare use case where you'd have TTLs applied at row granularity?  I suppose
this kind of fine-grained policy setting is desirable but I guess it's less clear why you
couldn't break stuff up into separate tables for varied TTLs or multi-tenancy.  Or if you
have these very specific and fine-grained settings like variable TTL you would implement them
in your application.

When do you set this stuff?  Would inserts be augmented?  Would there be special types of
KVs that you could write at the same time you insert the actual data?  Above description addresses
where it is stored and when it is looked up, but not how it is set.  Would Put be extended
with per-row setTTL, setACL methods now?

Out of curiosity, which BT-like systems support per-value ACLs?  I don't think I've seen this
in any DBs I've worked with.

> Table metacolumns
> -----------------
>
>                 Key: HBASE-2893
>                 URL: https://issues.apache.org/jira/browse/HBASE-2893
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> Some features like TTLs or access control lists have use cases that call for per-value
configurability. 
> Currently in HBase TTLs are set per column family. This leads to potentially awkward
"bucketing" of values into column families set up to accommodate the common desired TTLs for
all values within -- an unnecessarily wide schema, with resulting unnecessary reduction in
I/O locality in access patterns, more store files than otherwise, and so on.
> Over in HBASE-1697 we're considering setting ACLs on column families. However, we are
aware of other BT-like systems which support per-value ACLs. This allows for multitenancy
in a single table as opposed to really requiring tables for each customer (or, at least column
families). The scale out properties for a single table are better than alternatives. I think
supporting per-row ACLs would be generally sufficient: customer ID could be part of the row
key. We can still plan to maintain column-family level ACLs. We would therefore not have to
bloat the store with per-row ACLs for the normal case -- but it would be highly useful to
support overrides for particular rows. So how to do that?
> I propose to introduce _metacolumns_. 
> A _metacolumn_ would be a column family intrinsic to every table, created by the system
at table create time.  It would be accessible like any other column family, but we expect
a default ACL that only allows access by the system and operator principals, and would function
like any other, except administrative actions such as renaming or deletion would not be allowed.
 Into the metacolumn would be stored per-row overrides for such things as ACLs and TTLs. The
metacolumn therefore would be as sparse as possible; no storage would required for any overrides
if a value is committed with defaults. A reasonably sparse metacolumn for a region may fit
entirely within blockcache. It may be possible for all metacolumns on a RS to fit within blockcache
without undue pressure on other users. We can aim design effort at this target. 
> The scope of changes required to support this is:
> - Introduce metacolumn concept in the code and into the security model (default ACL):
A flag in HCD, a default ACL, and a few additional checks for rejecting disallowed administrative
actions.
> - Automatically create metacolumns at table create time.
> - Consult metatable as part of processing reads or mutations, perhaps using a bloom filter
to shortcut lookups for rows with no metaentries, and apply configuration or security policy
overrides if found.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message