kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris George" <Christopher.Geo...@rms.com>
Subject Re: best practices to remove/retire data
Date Thu, 12 May 2016 15:32:38 GMT
How hard would a predicate based delete be?
Ie ScanDelete or something.
-Chris George

On 5/12/16, 9:24 AM, "Jean-Daniel Cryans" <jdcryans@apache.org<mailto:jdcryans@apache.org>>


Right now this use case is more difficult than it needs to be. In your previous thread, "Partition
and Split rows", we talked about non-covering range partition and this is something that would
help your use case a lot. Basically, you could create partitions that cover full days, and
everyday you could delete the old partitions while creating the next day's. Deleting a partition
is really quick and efficient compared to manually deleting individual rows.

Until this is available I'd do this with multiple table, but it's a mess to handle as you

Hope this helps,


On Thu, May 12, 2016 at 8:16 AM, Sand Stone <sand.m.stone@gmail.com<mailto:sand.m.stone@gmail.com>>
Hi. Presumably I need to write a program to delete the unwanted rows, say, remove all data
older than 3 days, while the table is still ingesting new data.

How well will this perform for large tables? Both deletion and ingestion wise.

Or for this specific case that I retire data by day, I should create a new table per day.
However then the users have to be aware of the table naming scheme somehow. If a mention policy
is changed. all the client side code might have to change (sure we can have one level of indirection
to minimize the pain).


View raw message