kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Birdsell <jordan.birdsell.k...@statefarm.com>
Subject RE: Weekly update 4/25
Date Tue, 26 Apr 2016 17:14:29 GMT
If we had to go less frequently than a day I’m sure it’d be acceptable.  The volume of
deletes is very low in this case.  In some tables we can just “erase” a column’s data
but in others, based on the data design, we must delete the entire row or group of rows.

From: Todd Lipcon [mailto:todd@cloudera.com]
Sent: Tuesday, April 26, 2016 12:59 PM
To: user@kudu.incubator.apache.org
Subject: Re: Weekly update 4/25

On Tue, Apr 26, 2016 at 8:28 AM, Jordan Birdsell <jordan.birdsell.kdvm@statefarm.com<mailto:jordan.birdsell.kdvm@statefarm.com>>
Yes, this is exactly what we need to do.  Not immediately is ok for our current requirements,
I’d say within a day would be ideal.

Even within a day can be tricky for this kind of system if you have a fairly uniform random
delete workload. That would imply that you're rewriting _all_ of your data every day, which
uses a fair amount of IO.

Are deletes extremely rare for your use case?

Is it the entire row of data that has to be deleted or would it be sufficient to "X out" some
particularly sensitive column?


From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<mailto:jdcryans@apache.org>]
Sent: Tuesday, April 26, 2016 11:15 AM

To: user@kudu.incubator.apache.org<mailto:user@kudu.incubator.apache.org>
Subject: Re: Weekly update 4/25

Oh I see so this is in order to comply with asks such as "much sure that data for some user/customer
is 100% deleted"? We'll still have the problem where we don't want to rewrite all the base
data files (GBs/TBs) to clean up KBs of data, although since a single row is always only part
of one row set, it means it's at most 64MB that you'd be rewriting.

BTW is it ok if the data isn't immediately deleted? How long is it acceptable to wait for
before it happens?


On Tue, Apr 26, 2016 at 8:04 AM, Jordan Birdsell <jordan.birdsell.kdvm@statefarm.com<mailto:jordan.birdsell.kdvm@statefarm.com>>
Correct.  As for the “latest version”, if a row is deleted in the latest version then
removing the old versions where it existed is exactly what we’re looking to do.  Basically,
we need a way to physically get rid of select rows (or data within a column for that matter)
and all versions of that row or column data.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<mailto:jdcryans@apache.org>]
Sent: Tuesday, April 26, 2016 10:56 AM
To: user@kudu.incubator.apache.org<mailto:user@kudu.incubator.apache.org>
Subject: Re: Weekly update 4/25

Hi Jordan,

In other words, you'd like to tag specific rows to be excluded from the default data history

Also, keep in mind that this improvement is about removing old versions of the data, it will
not delete the latest version. If you are used to HBase, it's like specifying some TTL plus
MIN_VERSIONS=1 so it doesn't completely age out a row.

Hope this helps,


On Tue, Apr 26, 2016 at 4:29 AM, Jordan Birdsell <jordan.birdsell.kdvm@statefarm.com<mailto:jordan.birdsell.kdvm@statefarm.com>>

Regarding row GC,  I see in the design document that the tablet history max age will be set
at the table level, would it be possible to make this something that can be overridden for
specific transactions?  We have some use cases that would require accelerated removal of data
from disk and other use cases that would not have the same requirement. Unfortunately, these
different use cases apply, often times, to the same tables.

Jordan Birdsell

From: Todd Lipcon [mailto:todd@apache.org<mailto:todd@apache.org>]
Sent: Monday, April 25, 2016 1:54 PM
To: dev@kudu.incubator.apache.org<mailto:dev@kudu.incubator.apache.org>; user@kudu.incubator.apache.org<mailto:user@kudu.incubator.apache.org>
Subject: Weekly update 4/25

Hey Kudu-ers,

For the last month and a half, I've been posting weekly summaries of community development
activity on the Kudu blog. In case you aren't on twitter or slack you might not have seen
the posts, so I'm going to start emailing them to the list as well.

Here's this week's update:

Feel free to reply to this mail if you have any questions or would like to get involved in


Todd Lipcon
Software Engineer, Cloudera
View raw message