cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Haddad <>
Subject Re: GDPR, Right to Be Forgotten, and Cassandra
Date Fri, 09 Feb 2018 18:54:42 GMT
A layer violation?  Seriously?  Technical solutions exist to solve business problems and I’m
100% fine with introducing former to solve the latter.

Look, if the goal is to purge information out of the DB as quickly as possible from a lot
of accounts, the fastest way to do it is to hijack the fact that you’re constantly rewriting
data through compaction and (ab)use it.  It avoids the overhead of tombstones, and can be
implemented in a way that allows you to to perform a single write / edit a text file / some
other trivial system and immediately start removing customer data.  It’s an incredibly efficient
way of bulk removing customer data.  

The wording around "The Right To Be Forgotten” is a little vague [1], and I don’t know
if "the right to be forgotten entitles the data subject to have the data controller erase
his/her personal data” means that tombstones are OK.  If you tombstone some row using TWCS,
it will literally *never* be deleted off disk, as opposed to using DeletingCompactionStrategy
where it could easily be removed without leaving data laying around in SSTables.  I’ve done
this already for this *exact* use case and know it works and works very well.

The debate around what is the “correct” way to solve the problem is a dogmatic one and
I don’t have any interest in pursuing it any further.  I’ve simply offered a solution
that I know works because I’ve done it, which is what the OP asked for.

[1] <>

> On Feb 9, 2018, at 10:33 AM, Dor Laor <> wrote:
> I think you're introducing a layer violation. GDPR is a business requirement and
> compaction is an implementation detail. 
> IMHO it's enough to delete the partition using regular CQL.
> It's true that it won't be deleted immedietly but it will be eventually deleted (welcome
to eventual consistency ;).
> Even with user defined compaction, compaction may not be running instantly, repair will
be required,
> there are other nodes in the cluster, maybe partitioned nodes with the data. There is
data in snapshots
> and backups.
> The business idea is to delete the data in a fast, reasonable time for humans and make
> first unreachable and later delete completely. 
> On Fri, Feb 9, 2018 at 8:51 AM, Jonathan Haddad < <>>
> That might be fine for a one off but is totally impractical at scale or when using TWCS.

> On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan < <>>
> Or use the new user-defined compaction option recently introduced, provided you can determine
over which SSTables a partition is spread
> On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad < <>>
> Give this a read through:
> Basically you write your own logic for how stuff gets forgotten, then you can recompact
every sstable with upgradesstables -a.  
> Jon
>> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar < <>>
>> Hi everyone,
>> Because of GDPR we really face the need to support “Right to Be Forgotten” requests
=> <>  stating
that "the controller shall have the obligation to erase personal data without undue delay"
>> Because I usually meet customers that do not have that much clients, modeling one
partition per client is almost always possible, easing deletion by partition key.
>> Then, appart from triggering a manual compaction on impacted tables using STCS, I
do not see how I can be GDPR compliant.
>> I'm kind of surprised not to find any thread on that matter on the ML, do you guys
have any modeling strategy that would make it easier to get rid of data ? 
>> Thank you for any given advice
>> Nicolas

View raw message