Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 23794 invoked from network); 1 Aug 2010 21:50:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Aug 2010 21:50:41 -0000 Received: (qmail 72883 invoked by uid 500); 1 Aug 2010 21:50:41 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 72823 invoked by uid 500); 1 Aug 2010 21:50:41 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72815 invoked by uid 99); 1 Aug 2010 21:50:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Aug 2010 21:50:41 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Aug 2010 21:50:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o71LoGgM024020 for ; Sun, 1 Aug 2010 21:50:17 GMT Message-ID: <11868939.109711280699416862.JavaMail.jira@thor> Date: Sun, 1 Aug 2010 17:50:16 -0400 (EDT) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-2893) Table metacolumns In-Reply-To: <15823365.103661280600956009.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894424#action_12894424 ] Andrew Purtell commented on HBASE-2893: --------------------------------------- bq. I would be for trying to get this and stuff like it into a coprocessor-style implementation I do like that idea too, if HBASE-1697 is not a core concern. It sounds like that is your opinion, that HBASE-1697 is not, correct? bq. It just seems significantly more disruptive to implement DAC via metacolumns than just through family meta data. Would be both implementing DAC via metacolumns and via family meta data, so the metacolumn can be sparse as possible, empty for the normal case, at least providing this to the designer. bq. Would the plan be to do the DAC/ACL stuff without this and then add it? Or would this be a required piece of any implementation? Not required if maintaining ACLs on column families only. Not required if maintaining current situation with per-column family TTLs. Something like this would be necessary for per-row granularity I think. bq. I doubt they will have 1M users Not just 1M users, I can envision probable applications with 100M+ users, actually. Can't have 100M tables, can't have 100M column families. > Table metacolumns > ----------------- > > Key: HBASE-2893 > URL: https://issues.apache.org/jira/browse/HBASE-2893 > Project: HBase > Issue Type: New Feature > Reporter: Andrew Purtell > > Some features like TTLs or access control lists have use cases that call for per-value configurability. > Currently in HBase TTLs are set per column family. This leads to potentially awkward "bucketing" of values into column families set up to accommodate the common desired TTLs for all values within -- an unnecessarily wide schema, with resulting unnecessary reduction in I/O locality in access patterns, more store files than otherwise, and so on. > Over in HBASE-1697 we're considering setting ACLs on column families. However, we are aware of other BT-like systems which support per-value ACLs. This allows for multitenancy in a single table as opposed to really requiring tables for each customer (or, at least column families). The scale out properties for a single table are better than alternatives. I think supporting per-row ACLs would be generally sufficient: customer ID could be part of the row key. We can still plan to maintain column-family level ACLs. We would therefore not have to bloat the store with per-row ACLs for the normal case -- but it would be highly useful to support overrides for particular rows. So how to do that? > I propose to introduce _metacolumns_. > A _metacolumn_ would be a column family intrinsic to every table, created by the system at table create time. It would be accessible like any other column family, but we expect a default ACL that only allows access by the system and operator principals, and would function like any other, except administrative actions such as renaming or deletion would not be allowed. Into the metacolumn would be stored per-row overrides for such things as ACLs and TTLs. The metacolumn therefore would be as sparse as possible; no storage would required for any overrides if a value is committed with defaults. A reasonably sparse metacolumn for a region may fit entirely within blockcache. It may be possible for all metacolumns on a RS to fit within blockcache without undue pressure on other users. We can aim design effort at this target. > The scope of changes required to support this is: > - Introduce metacolumn concept in the code and into the security model (default ACL): A flag in HCD, a default ACL, and a few additional checks for rejecting disallowed administrative actions. > - Automatically create metacolumns at table create time. > - Consult metacolumn as part of processing reads or mutations, perhaps using a bloom filter to shortcut lookups for rows with no metaentries, and apply configuration or security policy overrides if found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.