kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: best practices to remove/retire data
Date Thu, 12 May 2016 17:52:34 GMT
On Thu, May 12, 2016 at 8:32 AM, Chris George <Christopher.George@rms.com>

> How hard would a predicate based delete be?
> Ie ScanDelete or something.
> -Chris George

That might be pretty difficult, since it implicitly assumes cross row
transactional consistency.  If consistency isn't required you can simulate
it today by starting the scan and issuing deletes for each result.

- Dan

> On 5/12/16, 9:24 AM, "Jean-Daniel Cryans" <jdcryans@apache.org> wrote:
> Hi,
> Right now this use case is more difficult than it needs to be. In your
> previous thread, "Partition and Split rows", we talked about non-covering
> range partition and this is something that would help your use case a lot.
> Basically, you could create partitions that cover full days, and everyday
> you could delete the old partitions while creating the next day's. Deleting
> a partition is really quick and efficient compared to manually deleting
> individual rows.
> Until this is available I'd do this with multiple table, but it's a mess
> to handle as you described.
> Hope this helps,
> J-D
> On Thu, May 12, 2016 at 8:16 AM, Sand Stone <sand.m.stone@gmail.com>
> wrote:
>> Hi. Presumably I need to write a program to delete the unwanted rows,
>> say, remove all data older than 3 days, while the table is still ingesting
>> new data.
>> How well will this perform for large tables? Both deletion and ingestion
>> wise.
>> Or for this specific case that I retire data by day, I should create a
>> new table per day. However then the users have to be aware of the table
>> naming scheme somehow. If a mention policy is changed. all the client side
>> code might have to change (sure we can have one level of indirection to
>> minimize the pain).
>> Thanks.

View raw message