kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Lieber-Dembo <a...@cloudera.com>
Subject Re: Dimension table delete and recreate
Date Wed, 14 Aug 2019 23:02:50 GMT
(+user, -dev, as this is more appropriate for the users list)

The Kudu master currently keeps a record of all tables and partitions,
including those that have been deleted. With a high enough rate of
table deletion it's theoretically possible for that to consume a lot
of disk space or memory. In practice (and since you mentioned you'd do
it once an hour) I wouldn't expect it to be a problem.

There shouldn't be any long-lasting impact on the tablet servers
though; tablets belonging to deleted tables are completely expunged
from disk.

Alternatively, you may find it more intuitive to model the "create
new, wait, then drop old" data motion via range partitions in a single
table.

On Wed, Aug 14, 2019 at 9:42 AM Scott Reynolds <sdrreynolds@gmail.com> wrote:
>
> Hi developers,
>
> I have a dimension table that is generated by a spark job and written to
> kudu. I would like to remove the rows in the table that were not found by
> the spark job.
>
> To do this, I was thinking the f renaming the existing table so it keeps
> the UUID for existing queries create the table again and load the rows into
> it. An hour later come back through and delete the old table.
>
> If I were to do that what would your three highest concerns be? How would
> this affect kudu master process?

Mime
View raw message