accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-3970) Generating multiple views of a value at scan time
Date Wed, 29 Nov 2017 22:37:00 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christopher Tubbs updated ACCUMULO-3970:
----------------------------------------
    Fix Version/s:     (was: 2.0.0)

> Generating multiple views of a value at scan time
> -------------------------------------------------
>
>                 Key: ACCUMULO-3970
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Russ Weeks
>            Priority: Minor
>
> It would be useful to have the ability to generate different representations of a key-value
pair at scan time, based on the scan authorizations.
> For example, consider [HIPPA safe harbour de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates].
One of the rules for de-identifying a patient's date of birth is that if a patient is 89 years
old or younger, you can disclose his exact year of birth. If a patient is 90 years old or
over, you pretend that he's 90 years old.
> You can imagine implementing this as a key/value mapping in accumulo like,
> {{(pt_id, demographic, pt_dob, PII_DOB) -> "1925-08-22"}}
> {{(pt_id, demographic, pt_dob, SHD_DOB) -> "1925"}}
> Where the value corresponding to visibility SHD_DOB is produced at scan-time, depending
on the patient's current age.
> Another example would be the ability to produce a salted hash of a unique identifier
like a social security number or medical record number, where the salt (or the hash algorithm,
or the work factor...) could be specified dynamically without having to re-code all the values
in the system.
> More broadly speaking, this feature would give organizations more flexibility to change
how they deidentify, transform or anonymize data to suit different access levels.
> Of course, to do this you'd need to have a pluggable component that can process key/value
pairs before visibilities are evaluated. I can see why this might give a lot of people the
heeby-jeebies but I'd like to gather as much feedback as possible. Looking forward to hearing
your thoughts!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message