accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Viswanathan <srikant...@gmail.com>
Subject User authorizations in accumulo
Date Mon, 16 Feb 2015 22:39:45 GMT
Hello,

I'm using Accumulo to store raw and value-added data and expose this
data to a small number of end users. During ingestion, the system will
connect to accumulo as a single accumulo user called, say, "ingestor".
This user will first store data, and then later in the ingestion
pipeline read the same data back to add value and write the
value-added data back. End-users will connect as themselves (i.e.,
individual accumulo accounts) to read the data.

The questions I am facing are:
Q1. How to manage the read authorizations for the ingestor?
Q2. How to ensure data in accumulo is never orphaned due to current
users lacking authorizations to read certain columns?

It seems to me that I have two options, both of which will solve both
my problems above:
A1. Grant the ingestor a single authorization and store the data with
labels that allow the ingestor access via this label. e.g.,
"ingestor|(foo_end_user_group|bar_end_user_group)". By doing this, I
don't have to maintain special authorization logic for the ingestor,
and I can also fall back on it to read data that might otherwise be
orphaned.
A2.  Store only the end user groups in the visibility labels
("foo_end_user_group|bar_end_user_group"), and
force the ingestion user to obtain all group authorizations needed in
order to read the data. This will require special logic to update the
ingestor's authorizations when a new authorization is added to the
system.

A1 seems simpler to me, but I heard John Vines discourage this in his
talk at the 2014 Accumulo Summit.  Doesn't the user in either case see
the same set of data (i.e., "everything"). What then are the potential
pitfalls of A1 compared to A2?

Thank you!

Srikanth Viswanathan

Mime
View raw message