Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C2A9D61A for ; Fri, 10 Aug 2012 16:05:14 +0000 (UTC) Received: (qmail 41765 invoked by uid 500); 10 Aug 2012 16:05:14 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 41717 invoked by uid 500); 10 Aug 2012 16:05:14 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 41709 invoked by uid 99); 10 Aug 2012 16:05:14 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Aug 2012 16:05:14 +0000 Received: from localhost (HELO mail-pb0-f41.google.com) (127.0.0.1) (smtp-auth username afuchs, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Aug 2012 16:05:14 +0000 Received: by pbbro12 with SMTP id ro12so3246768pbb.0 for ; Fri, 10 Aug 2012 09:05:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.200.98 with SMTP id jr2mr13547249pbc.81.1344614713693; Fri, 10 Aug 2012 09:05:13 -0700 (PDT) Received: by 10.68.30.67 with HTTP; Fri, 10 Aug 2012 09:05:13 -0700 (PDT) Received: by 10.68.30.67 with HTTP; Fri, 10 Aug 2012 09:05:13 -0700 (PDT) In-Reply-To: <50252EF1.3000409@gmail.com> References: <50246F9D.2020300@gmail.com> <50252EF1.3000409@gmail.com> Date: Fri, 10 Aug 2012 12:05:13 -0400 Message-ID: Subject: Re: Security and data design advice on structuring data on accumulo From: Adam Fuchs To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=047d7b10ceeb5e9a1504c6eb8265 --047d7b10ceeb5e9a1504c6eb8265 Content-Type: text/plain; charset=ISO-8859-1 But that's not really n*m, since it only specifies me by name. This should be roughly linear with users, no? There is definitely a reliance on some external service managing the roles that docs are in, but this should be tractable. Adam On Aug 10, 2012 11:56 AM, "Josh Elser" wrote: > That's what I meant, user*doctors. > > It's not enough to say "healthteam", you have to qualify it by user too: > "adamhealthteam". > > On 8/10/12 9:02 AM, Adam Fuchs wrote: > > I guess I should have specified that the access time labels should be used > in conjunction with the role labels, like > "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)". > > Adam > On Aug 10, 2012 8:56 AM, "Benson Margulies" wrote: > >> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs wrote: >> > Not sure I understand why this gets into n*m roles. Can you elaborate? >> > >> > The question of when your physician should have access seems like it >> could >> > be represented by just a few labels, like "regularCheckup", >> > "illnessEvaluation", and "populationStudy". Those labels could then be >> tied >> > to an auditing system that could verify appropriateness of access over >> time. >> >> And if you change doctors? Maybe that's a job for some sort of role/group >> model. >> >> >> > >> > Adam >> > >> > On Aug 9, 2012 10:19 PM, "Josh Elser" wrote: >> >> >> >> I've thought quite a bit about the approach you've outlined >> previously.. >> >> >> >> The main caveat I've always struggled to overcome is how to encapsulate >> >> *when* a physician should have access to your records. This expands the >> >> problem into n*m roles which becomes difficult to manage inside >> Accumulo, >> >> especially as time elapses. >> >> >> >> On 8/8/2012 6:29 PM, Marc Parisi wrote: >> >>> >> >>> Just some ideas and thoughts.... >> >>> >> >>> With a system I'm building I have code to take care of user roles. >> Roles >> >>> will define visibilities, how analysis is performed, information >> >>> sharing, etc. I have a particular role for sharing. I also have an >> area >> >>> of interest, usually assigned to a physician role, therefore only a >> >>> physician's office can see certain data from it. The data >> corresponding >> >>> to a given person can be accessed by that person ( if they have app >> >>> access ), the physician that created it, and other physicians ( with a >> >>> different area of interest ) with whom the user wants to share their >> >>> data. Each area of interest will be cryptographically secured. Our >> >>> approach will utilize multiple crypto technologies. I would suggest >> >>> making crypto your last stop. Focus on getting >> >>> the visibility hierarchy designed. HIPAA requirements can come later. >> >>> >> >>> In my approach, there is no elevation of fields per se. Instead, there >> >>> are visibiilities for all assigned parties,so in my case it is a >> matter >> >>> of labeling. The data can have hierarchies, and each hierarchy has >> >>> different labels to control access. >> >>> >> >>> " Patient demographic fields are PHI (personal health information) and >> >>> these should not be visible to all who want to perform analysis, but >> >>> only to main administrators, >> >>> patient and maybe physician. I assume these would have to have >> >>> separate authorization label. " >> >>> >> >>> Yes. I think this is where roles will help. Assign roles and >> >>> visibilities to those roles. As of right now, I'm putting ephemeral >> data >> >>> in my visibilities ( user ID for a physician, among other things ). I >> >>> will probably move this to the qualifier and take a more simple >> approach >> >>> to visibilities. >> >>> >> >>> Each role has different actions. Right now I have four actions; >> syncing, >> >>> querying, deleting, and sharing. You don't have to capture actions, >> but >> >>> you might want to limit how the roles of users vary, and I think >> >>> modeling the security actions within each role is an excellent way to >> do >> >>> so. >> >>> >> >>> >> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli > >>> > wrote: >> >>> >> >>> I am trying to model the healthcare claim on accumulo and I want >> to >> >>> lay it out so that it: >> >>> >> >>> A. Accurately reflects the structure of the claim >> >>> >> >>> B. I could have controls finely applied to different sections of >> the >> >>> document >> >>> >> >>> I am simplifying matter but claim contains claim document >> >>> identifiers, >> >>> demographics of the patient, and line items for the procedures >> >>> performed: >> >>> >> >>> claim identifier, data submitted, data processed, state of origin, >> >>> ... >> >>> patient name, dob, location, other identifiers >> >>> procedure 1 code, procedure 1 provider, procedure 1 cost, ... >> >>> ... >> >>> procedure n code, procedure n provider, procedure n cost, ... >> >>> >> >>> >> >>> Patient demographic fields are PHI (personal health information) >> and >> >>> these should not be visible to all who want to perform analysis, >> but >> >>> only to main administrators, >> >>> patient and maybe physician. I assume these would have to have >> >>> separate authorization label. >> >>> >> >>> Other fields may be visible to different groups of people - i.e. >> >>> federal claim administrators can see all, but regional offices >> can >> >>> only see their states. >> >>> Separate, more permissive labels. >> >>> >> >>> Finally, it might make sense to "elevate" some fields for easy >> access >> >>> and analysis - ie. diagnostic codes, zip code, cost. >> >>> This would not be a matter of labels, but data design. >> >>> >> >>> >> >>> With all this in mind, I would welcome if anyone has any security >> and >> >>> data design suggestions. >> >>> >> >>> >> > >> > > --047d7b10ceeb5e9a1504c6eb8265 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

But that's not really n*m, since it only specifies me by name. This = should be roughly linear with users, no?

There is definitely a reliance on some external service managing the rol= es that docs are in, but this should be tractable.

Adam

On Aug 10, 2012 11:56 AM, "Josh Elser"= <josh.elser@gmail.com> w= rote:
=20 =20 =20
That's what I meant, user*doctors.

It's not enough to say "healthteam", you have to qualify = it by user too: "adamhealthteam".

On 8/10/12 9:02 AM, Adam Fuchs wrote:

I guess I should have specified that the access time labels should be used in conjunction with the role labels, like "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateRe= searcher&populationStudy)".

Adam

On Aug 10, 2012 8:56 AM, "Benson Margulies" <bimargulies@gmail.com> wrote:
On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <afuchs@apache.org> wrote:
> Not sure I understand why this gets into n*m roles. Can you elaborate?
>
> The question of when your physician should have access seems like it could
> be represented by just a few labels, like "regularCheckup",
> "illnessEvaluation", and "populationStudy&quo= t;. Those labels could then be tied
> to an auditing system that could verify appropriateness of access over time.

And if you change doctors? Maybe that's a job for some sort o= f role/group model.


>
> Adam
>
> On Aug 9, 2012 10:19 PM, "Josh Elser" <josh.elser@gmail.com= > wrote:
>>
>> I've thought quite a bit about the approach you'= ve outlined previously..
>>
>> The main caveat I've always struggled to overcome is how to encapsulate
>> *when* a physician should have access to your records. This expands the
>> problem into n*m roles which becomes difficult to manage inside Accumulo,
>> especially as time elapses.
>>
>> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>>>
>>> Just some ideas and thoughts....
>>>
>>> With a system I'm building I have code to take care of user roles. Roles
>>> will define visibilities, how analysis is performed, information
>>> sharing, etc. I have a particular role for sharing. I also have an area
>>> of interest, usually assigned to a physician role, therefore only a
>>> physician's office can see certain data from it. The data corresponding
>>> to a given person can be accessed by that person ( if they have app
>>> access ), the physician that created it, and other physicians ( with a
>>> different area of interest ) with whom the user wants to share their
>>> data. Each area of interest will be cryptographically secured. Our
>>> approach will utilize multiple crypto technologies. I would suggest
>>> making crypto your last stop. Focus on getting
>>> the visibility hierarchy designed. HIPAA requirements can come later.
>>>
>>> In my approach, there is no elevation of fields per se. Instead, there
>>> are visibiilities for all assigned parties,so in my case it is a matter
>>> of labeling. The data can have hierarchies, and each hierarchy has
>>> different labels to control access.
>>>
>>> " Patient demographic fields are PHI (personal health information) and
>>> these should not be visible to all who want to perform analysis, but
>>> only to main administrators,
>>> patient and maybe physician. I assume these would have to have
>>> separate authorization label. "
>>>
>>> Yes. I think this is where roles will help. Assign roles and
>>> visibilities to those roles. As of right now, I'= m putting ephemeral data
>>> in my visibilities ( user ID for a physician, among other things ). I
>>> will probably move this to the qualifier and take a more simple approach
>>> to visibilities.
>>>
>>> Each role has different actions. Right now I have four actions; syncing,
>>> querying, deleting, and sharing. You don't have to capture actions, but
>>> you might want to limit how the roles of users vary, and I think
>>> modeling the security actions within each role is an excellent way to do
>>> so.
>>>
>>>
>>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com >>> <mailto:ebegoli@gmail.com>> wrote:
>>>
>>> =A0 =A0 I am trying to model the healthcare claim on accumulo and I want to
>>> =A0 =A0 lay it out so that it:
>>>
>>> =A0 =A0 A. Accurately reflects the structure of the claim
>>>
>>> =A0 =A0 B. I could have controls finely applied to different sections of the
>>> =A0 =A0 document
>>>
>>> =A0 =A0 I am simplifying matter but claim contains claim document
>>> identifiers,
>>> =A0 =A0 demographics of the patient, and line items for the procedures
>>> =A0 =A0 performed:
>>>
>>> =A0 =A0 claim identifier, data submitted, data processed, state of origin,
>>> ...
>>> =A0 =A0 patient name, dob, location, other identifiers
>>> =A0 =A0 procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>>> =A0 =A0 ...
>>> =A0 =A0 procedure n code, procedure n provider, procedure n cost, ...
>>>
>>>
>>> =A0 =A0 Patient demographic fields are PHI (personal health information) and
>>> =A0 =A0 these should not be visible to all who want to perform analysis, but
>>> =A0 =A0 only to main administrators,
>>> =A0 =A0 patient and maybe physician. I assume these would have to have
>>> =A0 =A0 separate authorization label.
>>>
>>> =A0 =A0 Other fields may be visible to different groups of people - i.e.
>>> =A0 =A0 federal claim administrators can see all, bu= t =A0regional offices can
>>> =A0 =A0 only see their states.
>>> =A0 =A0 Separate, more permissive labels.
>>>
>>> =A0 =A0 Finally, it might make sense to "elevat= e" some fields for easy access
>>> =A0 =A0 and analysis - ie. diagnostic codes, zip code, cost.
>>> =A0 =A0 This would not be a matter of labels, but data design.
>>>
>>>
>>> =A0 =A0 With all this in mind, I would welcome if anyone has any security and
>>> =A0 =A0 data design suggestions.
>>>
>>>
>

--047d7b10ceeb5e9a1504c6eb8265--