Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6543510B72 for ; Mon, 16 Feb 2015 22:56:44 +0000 (UTC) Received: (qmail 6433 invoked by uid 500); 16 Feb 2015 22:56:44 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 6372 invoked by uid 500); 16 Feb 2015 22:56:44 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 6362 invoked by uid 99); 16 Feb 2015 22:56:44 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 22:56:44 +0000 Received: from mail-qc0-f180.google.com (mail-qc0-f180.google.com [209.85.216.180]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id D8BF51A0292 for ; Mon, 16 Feb 2015 22:56:43 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id s11so26266959qcv.11 for ; Mon, 16 Feb 2015 14:56:42 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.140.93.23 with SMTP id c23mr458772qge.0.1424127402503; Mon, 16 Feb 2015 14:56:42 -0800 (PST) Received: by 10.229.169.204 with HTTP; Mon, 16 Feb 2015 14:56:42 -0800 (PST) In-Reply-To: References: Date: Mon, 16 Feb 2015 17:56:42 -0500 Message-ID: Subject: Re: User authorizations in accumulo From: Christopher To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a113749a8f0f8ae050f3c7ff6 --001a113749a8f0f8ae050f3c7ff6 Content-Type: text/plain; charset=UTF-8 I think part of your question pertains to the differences between ABAC (attribute-based access controls) and RBAC (role-based access controls). In both A1 and A2, you're thinking in terms of RBAC. The only real differences is whether you want to have one additional role, or repurpose the existing ones. However, Accumulo's data visibilities are more like ABAC. Of course, you can use whatever method works for you, but the intent is more ABAC than RBAC. The main pitfall with RBAC is that roles and users change, and data is complex and large and you don't want to re-write it when things change. However, attributes are properties of the data itself, upon which you can make access decisions. These attributes should be things that don't change... they are inherent to the data (ideal). To think in terms of ABAC, the main question to ask is "What properties of this data element will determine who can access it?". For example, does it contain personal information or medical history? Does it contain usernames and email addresses? What is it about this data that makes it worth protecting? Does it need to be protected? I think that's mainly what John Vines' talk was about (the differences between RBAC and ABAC). If RBAC is more appropriate for your data, I'd probably go with A1, because it's easier to implement and maintain. The biggest drawback is that you require additional storage space to store the additional role in each visibility. Because of some internal optimizations, if you go this route, I'd recommend making this role a prefix, rather than a suffix "SUPERUSER|(restOfVisibility)" vs. "(restOfVisibility)|SUPERUSER". -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Mon, Feb 16, 2015 at 5:39 PM, Srikanth Viswanathan wrote: > Hello, > > I'm using Accumulo to store raw and value-added data and expose this > data to a small number of end users. During ingestion, the system will > connect to accumulo as a single accumulo user called, say, "ingestor". > This user will first store data, and then later in the ingestion > pipeline read the same data back to add value and write the > value-added data back. End-users will connect as themselves (i.e., > individual accumulo accounts) to read the data. > > The questions I am facing are: > Q1. How to manage the read authorizations for the ingestor? > Q2. How to ensure data in accumulo is never orphaned due to current > users lacking authorizations to read certain columns? > > It seems to me that I have two options, both of which will solve both > my problems above: > A1. Grant the ingestor a single authorization and store the data with > labels that allow the ingestor access via this label. e.g., > "ingestor|(foo_end_user_group|bar_end_user_group)". By doing this, I > don't have to maintain special authorization logic for the ingestor, > and I can also fall back on it to read data that might otherwise be > orphaned. > A2. Store only the end user groups in the visibility labels > ("foo_end_user_group|bar_end_user_group"), and > force the ingestion user to obtain all group authorizations needed in > order to read the data. This will require special logic to update the > ingestor's authorizations when a new authorization is added to the > system. > > A1 seems simpler to me, but I heard John Vines discourage this in his > talk at the 2014 Accumulo Summit. Doesn't the user in either case see > the same set of data (i.e., "everything"). What then are the potential > pitfalls of A1 compared to A2? > > Thank you! > > Srikanth Viswanathan > --001a113749a8f0f8ae050f3c7ff6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think part of your question pertains to the differences = between ABAC (attribute-based access controls) and RBAC (role-based access = controls).

In both A1 and A2, you're thinking in terms of RBAC. = The only real differences is whether you want to have one additional role, = or repurpose the existing ones. However, Accumulo's data visibilities a= re more like ABAC. Of course, you can use whatever method works for you, bu= t the intent is more ABAC than RBAC.

The main pitfall with RBAC is t= hat roles and users change, and data is complex and large and you don't= want to re-write it when things change. However, attributes are properties= of the data itself, upon which you can make access decisions. These attrib= utes should be things that don't change... they are inherent to the dat= a (ideal).

To think in terms of ABAC, the main question to ask is &q= uot;What properties of this data element will determine who can access it?&= quot;. For example, does it contain personal information or medical history= ? Does it contain usernames and email addresses? What is it about this data= that makes it worth protecting? Does it need to be protected? I think that= 's mainly what John Vines' talk was about (the differences between = RBAC and ABAC).

If RBAC is more appropriate for your data, I'd p= robably go with A1, because it's easier to implement and maintain. The = biggest drawback is that you require additional storage space to store the = additional role in each visibility. Because of some internal optimizations,= if you go this route, I'd recommend making this role a prefix, rather = than a suffix "SUPERUSER|(restOfVisibility)" vs. "(restOfVis= ibility)|SUPERUSER".


--
Christopher L Tubbs II
http://gravatar.co= m/ctubbsii

On Mon, Feb 16, 2015 at 5:39 PM, Srikanth Vi= swanathan <srikanthv2@gmail.com> wrote:
Hello,

I'm using Accumulo to store raw and value-added data and expose this data to a small number of end users. During ingestion, the system will
connect to accumulo as a single accumulo user called, say, "ingestor&q= uot;.
This user will first store data, and then later in the ingestion
pipeline read the same data back to add value and write the
value-added data back. End-users will connect as themselves (i.e.,
individual accumulo accounts) to read the data.

The questions I am facing are:
Q1. How to manage the read authorizations for the ingestor?
Q2. How to ensure data in accumulo is never orphaned due to current
users lacking authorizations to read certain columns?

It seems to me that I have two options, both of which will solve both
my problems above:
A1. Grant the ingestor a single authorization and store the data with
labels that allow the ingestor access via this label. e.g.,
"ingestor|(foo_end_user_group|bar_end_user_group)". By doing this= , I
don't have to maintain special authorization logic for the ingestor, and I can also fall back on it to read data that might otherwise be
orphaned.
A2.=C2=A0 Store only the end user groups in the visibility labels
("foo_end_user_group|bar_end_user_group"), and
force the ingestion user to obtain all group authorizations needed in
order to read the data. This will require special logic to update the
ingestor's authorizations when a new authorization is added to the
system.

A1 seems simpler to me, but I heard John Vines discourage this in his
talk at the 2014 Accumulo Summit.=C2=A0 Doesn't the user in either case= see
the same set of data (i.e., "everything"). What then are the pote= ntial
pitfalls of A1 compared to A2?

Thank you!

Srikanth Viswanathan

--001a113749a8f0f8ae050f3c7ff6--