accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ Weeks (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-3970) Generating multiple views of a value at scan time
Date Sat, 22 Aug 2015 06:10:45 GMT
Russ Weeks created ACCUMULO-3970:
------------------------------------

             Summary: Generating multiple views of a value at scan time
                 Key: ACCUMULO-3970
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
             Project: Accumulo
          Issue Type: New Feature
            Reporter: Russ Weeks
            Priority: Minor
             Fix For: 1.8.0


It would be useful to have the ability to generate different representations of a key-value
pair at scan time, based on the scan authorizations.

For example, consider [HIPPA safe harbour de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates].
One of the rules for de-identifying a patient's date of birth is that if a patient is 89 years
old or younger, you can disclose his exact year of birth. If a patient is 90 years old or
over, you pretend that he's 90 years old.

You can imagine implementing this as a key/value mapping in accumulo like,
{{(pt_id, demographic, pt_dob, PII_DOB) -> "1925-08-22"}}
{{(pt_id, demographic, pt_dob, SHD_DOB) -> "1925"}}
Where the value corresponding to visibility SHD_DOB is produced at scan-time, depending on
the patient's current age.

Another example would be the ability to produce a salted hash of a unique identifier like
a social security number or medical record number, where the salt (or the hash algorithm,
or the work factor...) could be specified dynamically without having to re-code all the values
in the system.

More broadly speaking, this feature would give organizations more flexibility to change how
they deidentify, transform or anonymize data to suit different access levels.

Of course, to do this you'd need to have a pluggable component that can process key/value
pairs before visibilities are evaluated. I can see why this might give a lot of people the
heeby-jeebies but I'd like to gather as much feedback as possible. Looking forward to hearing
your thoughts!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message