accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ Weeks (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3970) Generating multiple views of a value at scan time
Date Mon, 24 Aug 2015 19:14:47 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709888#comment-14709888
] 

Russ Weeks commented on ACCUMULO-3970:
--------------------------------------

Thanks for your comment, Billie. I see what you mean about not wanting to rely on the table
admin
to manage data visiblities. I guess my use case could be summarized as, "As a user of Accumulo,
I
want to be able to put sensitive data into the system, and have different users see different
views
of that data in accordance with organizational policy". Flipping it around, you're not relying
on
the data producer to get de-identification "right". In my mind it's a valuable feature, but
I get
that I be in a minority among Accumulo's users in wanting it.


> Generating multiple views of a value at scan time
> -------------------------------------------------
>
>                 Key: ACCUMULO-3970
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Russ Weeks
>            Priority: Minor
>             Fix For: 1.8.0
>
>
> It would be useful to have the ability to generate different representations of a key-value
pair at scan time, based on the scan authorizations.
> For example, consider [HIPPA safe harbour de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates].
One of the rules for de-identifying a patient's date of birth is that if a patient is 89 years
old or younger, you can disclose his exact year of birth. If a patient is 90 years old or
over, you pretend that he's 90 years old.
> You can imagine implementing this as a key/value mapping in accumulo like,
> {{(pt_id, demographic, pt_dob, PII_DOB) -> "1925-08-22"}}
> {{(pt_id, demographic, pt_dob, SHD_DOB) -> "1925"}}
> Where the value corresponding to visibility SHD_DOB is produced at scan-time, depending
on the patient's current age.
> Another example would be the ability to produce a salted hash of a unique identifier
like a social security number or medical record number, where the salt (or the hash algorithm,
or the work factor...) could be specified dynamically without having to re-code all the values
in the system.
> More broadly speaking, this feature would give organizations more flexibility to change
how they deidentify, transform or anonymize data to suit different access levels.
> Of course, to do this you'd need to have a pluggable component that can process key/value
pairs before visibilities are evaluated. I can see why this might give a lot of people the
heeby-jeebies but I'd like to gather as much feedback as possible. Looking forward to hearing
your thoughts!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message