accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3970) Generating multiple views of a value at scan time
Date Sat, 22 Aug 2015 22:03:45 GMT


Josh Elser commented on ACCUMULO-3970:

bq. Because now the patient is 90 years old and their date of birth needs to be de-identified
differently. I guess I could achieve this by storing all 3 representations.... But I think
that approach becomes unwieldy very quickly.

:) yes, you very quickly hit on what I was trying to push towards. Storing the all of the
facets and using the visibility labels as a projection (they're just another attribute on
a 5-tuple, if you think of it that way) and thus it "jives" well with the original bigtable

You're also entirely right that this is difficult to manage without a bunch of extra work,
especially if those values aren't static. Writing in the updated value would still require
addt'l client interaction (ideally, updating the value with SHD_DOB from "1925" to "1925 or
earlier" since we should never show the original value again).

> Generating multiple views of a value at scan time
> -------------------------------------------------
>                 Key: ACCUMULO-3970
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Russ Weeks
>            Priority: Minor
>             Fix For: 1.8.0
> It would be useful to have the ability to generate different representations of a key-value
pair at scan time, based on the scan authorizations.
> For example, consider [HIPPA safe harbour de-identification|].
One of the rules for de-identifying a patient's date of birth is that if a patient is 89 years
old or younger, you can disclose his exact year of birth. If a patient is 90 years old or
over, you pretend that he's 90 years old.
> You can imagine implementing this as a key/value mapping in accumulo like,
> {{(pt_id, demographic, pt_dob, PII_DOB) -> "1925-08-22"}}
> {{(pt_id, demographic, pt_dob, SHD_DOB) -> "1925"}}
> Where the value corresponding to visibility SHD_DOB is produced at scan-time, depending
on the patient's current age.
> Another example would be the ability to produce a salted hash of a unique identifier
like a social security number or medical record number, where the salt (or the hash algorithm,
or the work factor...) could be specified dynamically without having to re-code all the values
in the system.
> More broadly speaking, this feature would give organizations more flexibility to change
how they deidentify, transform or anonymize data to suit different access levels.
> Of course, to do this you'd need to have a pluggable component that can process key/value
pairs before visibilities are evaluated. I can see why this might give a lot of people the
heeby-jeebies but I'd like to gather as much feedback as possible. Looking forward to hearing
your thoughts!

This message was sent by Atlassian JIRA

View raw message