kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco Venturi <fvent...@comcast.net>
Subject Change Data Capture (CDC) with Kudu
Date Fri, 22 Sep 2017 02:12:01 GMT

We are planning for a 50-100TB Kudu installation (about 200 tables or so). 

One of the requirements that we are working on is to have a secondary copy of our data in
a Disaster Recovery data center in a different location. 

Since we are going to have inserts, updates, and deletes (for instance in the case the primary
key is changed), we are trying to devise a process that will keep the secondary instance in
sync with the primary one. The two instances do not have to be identical in real-time (i.e.
we are not looking for synchronous writes to Kudu), but we would like to have some pretty
good confidence that the secondary instance contains all the changes that the primary has
up to say an hour before (or something like that). 

So far we considered a couple of options: 
- refreshing the seconday instance with a full copy of the primary one every so often, but
that would mean having to transfer say 50TB of data between the two locations every time,
and our network bandwidth constraints would prevent to do that even on a daily basis 
- having a column that contains the most recent time a row was updated, however this column
couldn't be part of the primary key (because the primary key in Kudu is immutable), and therefore
finding which rows have been changed every time would require a full scan of the table to
be sync'd. It would also rely on the "last update timestamp" column to be always updated by
the application (an assumption that we would like to avoid), and would need some other process
to take into accounts the rows that are deleted. 

Since many of today's RDBMS (Oracle, MySQL, etc) allow for some sort of 'Change Data Capture'
mechanism where only the 'deltas' are captured and applied to the secondary instance, we were
wondering if there's any way in Kudu to achieve something like that (possibly mining the WALs,
since my understanding is that each change gets applied to the WALs first). 

Franco Venturi 

View raw message