Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
MIME-Version: 1.0
In-Reply-To: <50252EF1.3000409@gmail.com>
References: 
 <CAGj+YsfTyT3s4XCdtFBN0HqROk1xuy7G6xuo6=W2ri1hWvhqpQ@mail.gmail.com>
	<CALwRz8PhF8SAfw7w8FD_3foaq2BdsnMvN_RV6jWkP_3X3T6tdw@mail.gmail.com>
	<50246F9D.2020300@gmail.com>
	<CAPMpPc52hc5QGXxEsTbWe+3kPqTAYrHXcyudLuEE-mC3wS77xw@mail.gmail.com>
	<CALhtWkeSEMbajnppdH05G4aXJ6HT=K28CjBJOb3jOPg2WXt3gA@mail.gmail.com>
	<CAPMpPc5GTHOH-CeOdnTe49L2ovRSAMCWYeK1hjQ0TNNft5tg+g@mail.gmail.com>
	<50252EF1.3000409@gmail.com>
Date: Fri, 10 Aug 2012 12:05:13 -0400
Message-ID: 
 <CAPMpPc4UL6THC_g0Bz8zYZZwGH4W1+7RuU2crmeJLsZsyLgqZQ@mail.gmail.com>
Subject: Re: Security and data design advice on structuring data on accumulo
From: Adam Fuchs <afuchs@apache.org>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=047d7b10ceeb5e9a1504c6eb8265

--047d7b10ceeb5e9a1504c6eb8265
Content-Type: text/plain; charset=ISO-8859-1

But that's not really n*m, since it only specifies me by name. This should
be roughly linear with users, no?

There is definitely a reliance on some external service managing the roles
that docs are in, but this should be tractable.

Adam
On Aug 10, 2012 11:56 AM, "Josh Elser" <josh.elser@gmail.com> wrote:

>  That's what I meant, user*doctors.
>
> It's not enough to say "healthteam", you have to qualify it by user too:
> "adamhealthteam".
>
> On 8/10/12 9:02 AM, Adam Fuchs wrote:
>
> I guess I should have specified that the access time labels should be used
> in conjunction with the role labels, like
> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>
> Adam
> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bimargulies@gmail.com> wrote:
>
>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <afuchs@apache.org> wrote:
>> > Not sure I understand why this gets into n*m roles. Can you elaborate?
>> >
>> > The question of when your physician should have access seems like it
>> could
>> > be represented by just a few labels, like "regularCheckup",
>> > "illnessEvaluation", and "populationStudy". Those labels could then be
>> tied
>> > to an auditing system that could verify appropriateness of access over
>> time.
>>
>> And if you change doctors? Maybe that's a job for some sort of role/group
>> model.
>>
>>
>> >
>> > Adam
>> >
>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <josh.elser@gmail.com> wrote:
>> >>
>> >> I've thought quite a bit about the approach you've outlined
>> previously..
>> >>
>> >> The main caveat I've always struggled to overcome is how to encapsulate
>> >> *when* a physician should have access to your records. This expands the
>> >> problem into n*m roles which becomes difficult to manage inside
>> Accumulo,
>> >> especially as time elapses.
>> >>
>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>> >>>
>> >>> Just some ideas and thoughts....
>> >>>
>> >>> With a system I'm building I have code to take care of user roles.
>> Roles
>> >>> will define visibilities, how analysis is performed, information
>> >>> sharing, etc. I have a particular role for sharing. I also have an
>> area
>> >>> of interest, usually assigned to a physician role, therefore only a
>> >>> physician's office can see certain data from it. The data
>> corresponding
>> >>> to a given person can be accessed by that person ( if they have app
>> >>> access ), the physician that created it, and other physicians ( with a
>> >>> different area of interest ) with whom the user wants to share their
>> >>> data. Each area of interest will be cryptographically secured. Our
>> >>> approach will utilize multiple crypto technologies. I would suggest
>> >>> making crypto your last stop. Focus on getting
>> >>> the visibility hierarchy designed. HIPAA requirements can come later.
>> >>>
>> >>> In my approach, there is no elevation of fields per se. Instead, there
>> >>> are visibiilities for all assigned parties,so in my case it is a
>> matter
>> >>> of labeling. The data can have hierarchies, and each hierarchy has
>> >>> different labels to control access.
>> >>>
>> >>> " Patient demographic fields are PHI (personal health information) and
>> >>> these should not be visible to all who want to perform analysis, but
>> >>> only to main administrators,
>> >>> patient and maybe physician. I assume these would have to have
>> >>> separate authorization label. "
>> >>>
>> >>> Yes. I think this is where roles will help. Assign roles and
>> >>> visibilities to those roles. As of right now, I'm putting ephemeral
>> data
>> >>> in my visibilities ( user ID for a physician, among other things ). I
>> >>> will probably move this to the qualifier and take a more simple
>> approach
>> >>> to visibilities.
>> >>>
>> >>> Each role has different actions. Right now I have four actions;
>> syncing,
>> >>> querying, deleting, and sharing. You don't have to capture actions,
>> but
>> >>> you might want to limit how the roles of users vary, and I think
>> >>> modeling the security actions within each role is an excellent way to
>> do
>> >>> so.
>> >>>
>> >>>
>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>> >>> <mailto:ebegoli@gmail.com>> wrote:
>> >>>
>> >>>     I am trying to model the healthcare claim on accumulo and I want
>> to
>> >>>     lay it out so that it:
>> >>>
>> >>>     A. Accurately reflects the structure of the claim
>> >>>
>> >>>     B. I could have controls finely applied to different sections of
>> the
>> >>>     document
>> >>>
>> >>>     I am simplifying matter but claim contains claim document
>> >>> identifiers,
>> >>>     demographics of the patient, and line items for the procedures
>> >>>     performed:
>> >>>
>> >>>     claim identifier, data submitted, data processed, state of origin,
>> >>> ...
>> >>>     patient name, dob, location, other identifiers
>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>> >>>     ...
>> >>>     procedure n code, procedure n provider, procedure n cost, ...
>> >>>
>> >>>
>> >>>     Patient demographic fields are PHI (personal health information)
>> and
>> >>>     these should not be visible to all who want to perform analysis,
>> but
>> >>>     only to main administrators,
>> >>>     patient and maybe physician. I assume these would have to have
>> >>>     separate authorization label.
>> >>>
>> >>>     Other fields may be visible to different groups of people - i.e.
>> >>>     federal claim administrators can see all, but  regional offices
>> can
>> >>>     only see their states.
>> >>>     Separate, more permissive labels.
>> >>>
>> >>>     Finally, it might make sense to "elevate" some fields for easy
>> access
>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
>> >>>     This would not be a matter of labels, but data design.
>> >>>
>> >>>
>> >>>     With all this in mind, I would welcome if anyone has any security
>> and
>> >>>     data design suggestions.
>> >>>
>> >>>
>> >
>>
>
>

--047d7b10ceeb5e9a1504c6eb8265
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>But that&#39;s not really n*m, since it only specifies me by name. This =
should be roughly linear with users, no?</p>
<p>There is definitely a reliance on some external service managing the rol=
es that docs are in, but this should be tractable.</p>
<p>Adam</p>
<div class=3D"gmail_quote">On Aug 10, 2012 11:56 AM, &quot;Josh Elser&quot;=
 &lt;<a href=3D"mailto:josh.elser@gmail.com">josh.elser@gmail.com</a>&gt; w=
rote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    That&#39;s what I meant, user*doctors.<br>
    <br>
    It&#39;s not enough to say &quot;healthteam&quot;, you have to qualify =
it by user
    too: &quot;adamhealthteam&quot;.<br>
    <br>
    <div>On 8/10/12 9:02 AM, Adam Fuchs wrote:<br>
    </div>
    <blockquote type=3D"cite">
      <p>I guess I should have specified that the access time labels
        should be used in conjunction with the role labels, like
&quot;(adamsHealthTeam&amp;(regularCheckup|illnessEvaluation))|(massStateRe=
searcher&amp;populationStudy)&quot;.</p>
      <p>Adam</p>
      <div class=3D"gmail_quote">On Aug 10, 2012 8:56 AM, &quot;Benson
        Margulies&quot; &lt;<a href=3D"mailto:bimargulies@gmail.com" target=
=3D"_blank">bimargulies@gmail.com</a>&gt;
        wrote:<br type=3D"attribution">
        <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex">
          On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs &lt;<a href=3D"mailto=
:afuchs@apache.org" target=3D"_blank">afuchs@apache.org</a>&gt;
          wrote:<br>
          &gt; Not sure I understand why this gets into n*m roles. Can
          you elaborate?<br>
          &gt;<br>
          &gt; The question of when your physician should have access
          seems like it could<br>
          &gt; be represented by just a few labels, like
          &quot;regularCheckup&quot;,<br>
          &gt; &quot;illnessEvaluation&quot;, and &quot;populationStudy&quo=
t;. Those labels
          could then be tied<br>
          &gt; to an auditing system that could verify appropriateness
          of access over time.<br>
          <br>
          And if you change doctors? Maybe that&#39;s a job for some sort o=
f
          role/group model.<br>
          <br>
          <br>
          &gt;<br>
          &gt; Adam<br>
          &gt;<br>
          &gt; On Aug 9, 2012 10:19 PM, &quot;Josh Elser&quot; &lt;<a href=
=3D"mailto:josh.elser@gmail.com" target=3D"_blank">josh.elser@gmail.com</a>=
&gt;
          wrote:<br>
          &gt;&gt;<br>
          &gt;&gt; I&#39;ve thought quite a bit about the approach you&#39;=
ve
          outlined previously..<br>
          &gt;&gt;<br>
          &gt;&gt; The main caveat I&#39;ve always struggled to overcome is
          how to encapsulate<br>
          &gt;&gt; *when* a physician should have access to your
          records. This expands the<br>
          &gt;&gt; problem into n*m roles which becomes difficult to
          manage inside Accumulo,<br>
          &gt;&gt; especially as time elapses.<br>
          &gt;&gt;<br>
          &gt;&gt; On 8/8/2012 6:29 PM, Marc Parisi wrote:<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; Just some ideas and thoughts....<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; With a system I&#39;m building I have code to take
          care of user roles. Roles<br>
          &gt;&gt;&gt; will define visibilities, how analysis is
          performed, information<br>
          &gt;&gt;&gt; sharing, etc. I have a particular role for
          sharing. I also have an area<br>
          &gt;&gt;&gt; of interest, usually assigned to a physician
          role, therefore only a<br>
          &gt;&gt;&gt; physician&#39;s office can see certain data from it.
          The data corresponding<br>
          &gt;&gt;&gt; to a given person can be accessed by that person
          ( if they have app<br>
          &gt;&gt;&gt; access ), the physician that created it, and
          other physicians ( with a<br>
          &gt;&gt;&gt; different area of interest ) with whom the user
          wants to share their<br>
          &gt;&gt;&gt; data. Each area of interest will be
          cryptographically secured. Our<br>
          &gt;&gt;&gt; approach will utilize multiple crypto
          technologies. I would suggest<br>
          &gt;&gt;&gt; making crypto your last stop. Focus on getting<br>
          &gt;&gt;&gt; the visibility hierarchy designed. HIPAA
          requirements can come later.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; In my approach, there is no elevation of fields
          per se. Instead, there<br>
          &gt;&gt;&gt; are visibiilities for all assigned parties,so in
          my case it is a matter<br>
          &gt;&gt;&gt; of labeling. The data can have hierarchies, and
          each hierarchy has<br>
          &gt;&gt;&gt; different labels to control access.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; &quot; Patient demographic fields are PHI (personal
          health information) and<br>
          &gt;&gt;&gt; these should not be visible to all who want to
          perform analysis, but<br>
          &gt;&gt;&gt; only to main administrators,<br>
          &gt;&gt;&gt; patient and maybe physician. I assume these would
          have to have<br>
          &gt;&gt;&gt; separate authorization label. &quot;<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; Yes. I think this is where roles will help.
          Assign roles and<br>
          &gt;&gt;&gt; visibilities to those roles. As of right now, I&#39;=
m
          putting ephemeral data<br>
          &gt;&gt;&gt; in my visibilities ( user ID for a physician,
          among other things ). I<br>
          &gt;&gt;&gt; will probably move this to the qualifier and take
          a more simple approach<br>
          &gt;&gt;&gt; to visibilities.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; Each role has different actions. Right now I have
          four actions; syncing,<br>
          &gt;&gt;&gt; querying, deleting, and sharing. You don&#39;t have
          to capture actions, but<br>
          &gt;&gt;&gt; you might want to limit how the roles of users
          vary, and I think<br>
          &gt;&gt;&gt; modeling the security actions within each role is
          an excellent way to do<br>
          &gt;&gt;&gt; so.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli &lt;<a =
href=3D"mailto:ebegoli@gmail.com" target=3D"_blank">ebegoli@gmail.com</a><b=
r>
          &gt;&gt;&gt; &lt;mailto:<a href=3D"mailto:ebegoli@gmail.com" targ=
et=3D"_blank">ebegoli@gmail.com</a>&gt;&gt;
          wrote:<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 I am trying to model the healthcare claim on
          accumulo and I want to<br>
          &gt;&gt;&gt; =A0 =A0 lay it out so that it:<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 A. Accurately reflects the structure of the
          claim<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 B. I could have controls finely applied to
          different sections of the<br>
          &gt;&gt;&gt; =A0 =A0 document<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 I am simplifying matter but claim contains
          claim document<br>
          &gt;&gt;&gt; identifiers,<br>
          &gt;&gt;&gt; =A0 =A0 demographics of the patient, and line items
          for the procedures<br>
          &gt;&gt;&gt; =A0 =A0 performed:<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 claim identifier, data submitted, data
          processed, state of origin,<br>
          &gt;&gt;&gt; ...<br>
          &gt;&gt;&gt; =A0 =A0 patient name, dob, location, other
          identifiers<br>
          &gt;&gt;&gt; =A0 =A0 procedure 1 code, procedure 1 provider,
          procedure 1 cost, ...<br>
          &gt;&gt;&gt; =A0 =A0 ...<br>
          &gt;&gt;&gt; =A0 =A0 procedure n code, procedure n provider,
          procedure n cost, ...<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 Patient demographic fields are PHI (personal
          health information) and<br>
          &gt;&gt;&gt; =A0 =A0 these should not be visible to all who want
          to perform analysis, but<br>
          &gt;&gt;&gt; =A0 =A0 only to main administrators,<br>
          &gt;&gt;&gt; =A0 =A0 patient and maybe physician. I assume these
          would have to have<br>
          &gt;&gt;&gt; =A0 =A0 separate authorization label.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 Other fields may be visible to different
          groups of people - i.e.<br>
          &gt;&gt;&gt; =A0 =A0 federal claim administrators can see all, bu=
t
          =A0regional offices can<br>
          &gt;&gt;&gt; =A0 =A0 only see their states.<br>
          &gt;&gt;&gt; =A0 =A0 Separate, more permissive labels.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 Finally, it might make sense to &quot;elevat=
e&quot;
          some fields for easy access<br>
          &gt;&gt;&gt; =A0 =A0 and analysis - ie. diagnostic codes, zip
          code, cost.<br>
          &gt;&gt;&gt; =A0 =A0 This would not be a matter of labels, but
          data design.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt; =A0 =A0 With all this in mind, I would welcome if
          anyone has any security and<br>
          &gt;&gt;&gt; =A0 =A0 data design suggestions.<br>
          &gt;&gt;&gt;<br>
          &gt;&gt;&gt;<br>
          &gt;<br>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>

--047d7b10ceeb5e9a1504c6eb8265--