accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <aa...@cordovas.org>
Subject Re: "NOT" operator in visibility string
Date Sat, 15 Mar 2014 21:08:13 GMT
It’d be nice to perhaps have lots of use cases documented. I suspect there are more cases
that could be added to this list.

How to label data has always been very flexible, but as a result has proven somewhat difficult
to decide in some cases. The original design and intent of security labels was to allow users
to express certain attributes of the data, such as sensitivity or source etc.

These attributes are not the only thing that needs to be considered when deciding to grant
read access. Other things, like the roles and responsibilities of users are handled outside
the security labeling system, in an external application that maps users to roles for example.
These aspects of security can change over time and as such they are typically managed in an
external mutable system and are not stored in the security label.

I think users can get into trouble when they begin to push attributes that aren’t specific
to the data into the security label. This is trouble because attributes of the data are not
likely to change over time, and so make sense to store in a system of immutable but versioned
data like Accumulo. ‘Rewriting' data labels (i.e. writing a new version of all the data)
when there is a lot of data is non-trivial.


As for #3 and #4 below ...

On Mar 10, 2014, at 2:07 PM, Mike Drob <madrob@cloudera.com> wrote:

> ...
> 
> Use Case #3: Users have general access to the system in line with their
> overall authorization level, but are excluded from a few categories for
> general reasons.
> 
> Example: There are records marked as { (secret | topsecret) & !probation }
> which are available to all users with "secret" or "topsecret" access, but
> not those on "probation." Once a user is removed from the probation role,
> then her access should be granted to this data without further action.
> (From the HBase docs).

I believe this case can be handled today (i.e. without NOT) by removing the assignment of
secret and top-secret for the user on probation in whatever external system manages those
assignments. Whether a user is on probation is temporary and not specific to the data and
so is not appropriate information to stored in a security label.

> Use Case #4: Users have general access, with specific exclusions that are
> unique to each user.
> 
> Example: An employee review database where all employees have access to all
> records of every team they have worked on, except for their own files.
> Generalization: an employee has to everybody else's records at the company,
> except their own. Intent is to create a system similar to the review system
> at Valve.

This case is somewhat strange since users doubtless have access to their own files at other
times and for other purposes (i.e. when creating the files), but just not for the purposes
of review. Such access controls are also not a function of the data, but rather the purpose
of the application.

In this case, the external system managing assignments of users to security tokens could simply
be configured to grant the tokens representing access to everyone else’s files to a user
and not their own. This is not hard because the number of users is known and is available
in the external token assignment system.

In other applications, such as those in which users create and manage their own files, the
external assignment system would simply assign to users the tokens to access their own files.

I don’t see a case for having a NOT operator as part of the security label being made in
these cases.

———

Here is a difficult use case, known as the ‘ethical wall’ or some other names, which is
intended to prevent conflicts of interest:

	Users can access any one type of data, but once they have accessed one type they cannot access
any other type.

For example, once you have access to the audit logs, you cannot access the primary data set,
and vice versa. Or say a researcher can see one company’s information, but once she has
seen that information, she can’t see any other companies’ information, or else risk a
conflict of interest.

If for some weird reason users were the ones to pick which one data set they got to see, they
would have to be excluded from all other data sets after that point.

Again, this information - which data sets has a user ever seen - is not specific to the data
and so should not be stored in the security label. Rather, an application can be written to
keep track of what datasets a user has seen if any. If the user has never done a query, she
can query any data set once. After that, she can only issue queries to that same data set.
The application can simply keep track of which data set each user chose and assign security
tokens appropriately.

———

Another tough use case is around combining data. Some types of data sensitivities change when
combined with other data. This is hard to capture in a security label since each label applies
to exactly one key-value pair.

In this case you’d need perhaps an iterator or logic in an application that tracks how many
elements are being accessed at once and that can apply rules for increasing or decreasing
the sensitivity of the entire result set accordingly. It would be interesting to use an iterator
to do this since you could have one iterator adjusting security levels based on the combined
data, and then reapply the same old security filtering logic to the newly derived aggregate
labels.

For example, users might be able to do point queries (i.e. specify an entire key so that only
a single value is returned) and labels could be used as always.

But then a user might fetch a whole row, say using the WholeRowIterator, which might have
a higher or lower sensitivity level than any one data element. It would be nice to be able
to present a single key value pair containing the entire row, and to have a security label
that described the sensitivity of that whole row.

Similarly, some iterators summarize data which might result in the data being more or less
sensitive.

———

In none of the previous four cases do I see a need for NOT to be implemented. But I’d like
to hear what other use cases people are looking at.


Aaron



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message