accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Security and data design advice on structuring data on accumulo
Date Fri, 10 Aug 2012 16:28:07 GMT
The underlying issue I'm poking at is this:

Pluggable authorizations systems I've seen attached to Accumulo in the 
past have operated in the following fashion: A single superuser in 
Accumulo has all of the authorizations for data stored in Accumulo. The 
authorization system determines the correct Accumulo Authorizations for 
the current user and intersects the user's Authorizations with the 
superuser's Authorizations (read as: all Authorizations) to perform a 
scan over Accumulo at the desired level. Thus, end-users don't have 
accounts on Accumulo; user queries run as a the superuser.

Back to the current example, as you said, the number of "groups" should 
grow roughly linearly to the number of users; however, this now requires 
that every user has an Accumulo account. The difference is that a doctor 
will be in many users' groups (e.g. you and I could share a doctor). To 
my understanding, all of this user/authorization information is stored 
inside of ZooKeeper. It seems less-than-ideal to me to store user 
accounts for every patient and every doctor, where every doctor has many 
"roles", but it also appears intractable to me to have a 
single-superuser with all auths (as previously outlined).

I'm sure a user-roles approach could work to a point; but I feel like 
there is potential for a much more elegant solution. I'm curious if 
others have had thoughts about this.

On 8/10/12 12:05 PM, Adam Fuchs wrote:
>
> But that's not really n*m, since it only specifies me by name. This 
> should be roughly linear with users, no?
>
> There is definitely a reliance on some external service managing the 
> roles that docs are in, but this should be tractable.
>
> Adam
>
> On Aug 10, 2012 11:56 AM, "Josh Elser" <josh.elser@gmail.com 
> <mailto:josh.elser@gmail.com>> wrote:
>
>     That's what I meant, user*doctors.
>
>     It's not enough to say "healthteam", you have to qualify it by
>     user too: "adamhealthteam".
>
>     On 8/10/12 9:02 AM, Adam Fuchs wrote:
>>
>>     I guess I should have specified that the access time labels
>>     should be used in conjunction with the role labels, like
>>     "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>>
>>     Adam
>>
>>     On Aug 10, 2012 8:56 AM, "Benson Margulies"
>>     <bimargulies@gmail.com <mailto:bimargulies@gmail.com>> wrote:
>>
>>         On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs
>>         <afuchs@apache.org <mailto:afuchs@apache.org>> wrote:
>>         > Not sure I understand why this gets into n*m roles. Can you
>>         elaborate?
>>         >
>>         > The question of when your physician should have access
>>         seems like it could
>>         > be represented by just a few labels, like "regularCheckup",
>>         > "illnessEvaluation", and "populationStudy". Those labels
>>         could then be tied
>>         > to an auditing system that could verify appropriateness of
>>         access over time.
>>
>>         And if you change doctors? Maybe that's a job for some sort
>>         of role/group model.
>>
>>
>>         >
>>         > Adam
>>         >
>>         > On Aug 9, 2012 10:19 PM, "Josh Elser" <josh.elser@gmail.com
>>         <mailto:josh.elser@gmail.com>> wrote:
>>         >>
>>         >> I've thought quite a bit about the approach you've
>>         outlined previously..
>>         >>
>>         >> The main caveat I've always struggled to overcome is how
>>         to encapsulate
>>         >> *when* a physician should have access to your records.
>>         This expands the
>>         >> problem into n*m roles which becomes difficult to manage
>>         inside Accumulo,
>>         >> especially as time elapses.
>>         >>
>>         >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>>         >>>
>>         >>> Just some ideas and thoughts....
>>         >>>
>>         >>> With a system I'm building I have code to take care of
>>         user roles. Roles
>>         >>> will define visibilities, how analysis is performed,
>>         information
>>         >>> sharing, etc. I have a particular role for sharing. I
>>         also have an area
>>         >>> of interest, usually assigned to a physician role,
>>         therefore only a
>>         >>> physician's office can see certain data from it. The data
>>         corresponding
>>         >>> to a given person can be accessed by that person ( if
>>         they have app
>>         >>> access ), the physician that created it, and other
>>         physicians ( with a
>>         >>> different area of interest ) with whom the user wants to
>>         share their
>>         >>> data. Each area of interest will be cryptographically
>>         secured. Our
>>         >>> approach will utilize multiple crypto technologies. I
>>         would suggest
>>         >>> making crypto your last stop. Focus on getting
>>         >>> the visibility hierarchy designed. HIPAA requirements can
>>         come later.
>>         >>>
>>         >>> In my approach, there is no elevation of fields per se.
>>         Instead, there
>>         >>> are visibiilities for all assigned parties,so in my case
>>         it is a matter
>>         >>> of labeling. The data can have hierarchies, and each
>>         hierarchy has
>>         >>> different labels to control access.
>>         >>>
>>         >>> " Patient demographic fields are PHI (personal health
>>         information) and
>>         >>> these should not be visible to all who want to perform
>>         analysis, but
>>         >>> only to main administrators,
>>         >>> patient and maybe physician. I assume these would have to
>>         have
>>         >>> separate authorization label. "
>>         >>>
>>         >>> Yes. I think this is where roles will help. Assign roles and
>>         >>> visibilities to those roles. As of right now, I'm putting
>>         ephemeral data
>>         >>> in my visibilities ( user ID for a physician, among other
>>         things ). I
>>         >>> will probably move this to the qualifier and take a more
>>         simple approach
>>         >>> to visibilities.
>>         >>>
>>         >>> Each role has different actions. Right now I have four
>>         actions; syncing,
>>         >>> querying, deleting, and sharing. You don't have to
>>         capture actions, but
>>         >>> you might want to limit how the roles of users vary, and
>>         I think
>>         >>> modeling the security actions within each role is an
>>         excellent way to do
>>         >>> so.
>>         >>>
>>         >>>
>>         >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli
>>         <ebegoli@gmail.com <mailto:ebegoli@gmail.com>
>>         >>> <mailto:ebegoli@gmail.com <mailto:ebegoli@gmail.com>>>
wrote:
>>         >>>
>>         >>>     I am trying to model the healthcare claim on accumulo
>>         and I want to
>>         >>>     lay it out so that it:
>>         >>>
>>         >>>     A. Accurately reflects the structure of the claim
>>         >>>
>>         >>>     B. I could have controls finely applied to different
>>         sections of the
>>         >>>     document
>>         >>>
>>         >>>     I am simplifying matter but claim contains claim document
>>         >>> identifiers,
>>         >>>     demographics of the patient, and line items for the
>>         procedures
>>         >>>     performed:
>>         >>>
>>         >>>     claim identifier, data submitted, data processed,
>>         state of origin,
>>         >>> ...
>>         >>>     patient name, dob, location, other identifiers
>>         >>>     procedure 1 code, procedure 1 provider, procedure 1
>>         cost, ...
>>         >>>     ...
>>         >>>     procedure n code, procedure n provider, procedure n
>>         cost, ...
>>         >>>
>>         >>>
>>         >>>     Patient demographic fields are PHI (personal health
>>         information) and
>>         >>>     these should not be visible to all who want to
>>         perform analysis, but
>>         >>>     only to main administrators,
>>         >>>     patient and maybe physician. I assume these would
>>         have to have
>>         >>>     separate authorization label.
>>         >>>
>>         >>>     Other fields may be visible to different groups of
>>         people - i.e.
>>         >>>     federal claim administrators can see all, but
>>          regional offices can
>>         >>>     only see their states.
>>         >>>     Separate, more permissive labels.
>>         >>>
>>         >>>     Finally, it might make sense to "elevate" some fields
>>         for easy access
>>         >>>     and analysis - ie. diagnostic codes, zip code, cost.
>>         >>>     This would not be a matter of labels, but data design.
>>         >>>
>>         >>>
>>         >>>     With all this in mind, I would welcome if anyone has
>>         any security and
>>         >>>     data design suggestions.
>>         >>>
>>         >>>
>>         >
>>
>


Mime
View raw message