kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Change Data Capture (CDC) with Kudu
Date Fri, 22 Sep 2017 21:32:41 GMT
Franco,
I just realized that I suggested something you mentioned in your initial
email. My mistake for not reading through to the end. It is probably the
least-worst approach right now and it's probably what I would do if I were
you.

Mike

On Fri, Sep 22, 2017 at 2:29 PM, Mike Percy <mpercy@apache.org> wrote:

> CDC is something that I would like to see in Kudu but we aren't there yet
> with the underlying support in the Raft Consensus implementation. Once we
> have higher availability re-replication support (KUDU-1097) we will be a
> bit closer for a solution involving traditional WAL streaming to an
> external consumer because we will have support for non-voting replicas. But
> there would still be plenty of work to do to support CDC after that, at
> least from an API perspective as well as a WAL management perspective (how
> long to keep old log files).
>
> That said, what you really are asking for is a streaming backup solution,
> which may or may not use the same mechanism (unfortunately it's not
> designed or implemented yet).
>
> As an alternative to Adar's suggestions, a reasonable option for you at
> this time may be an incremental backup. It takes a little schema design to
> do it, though. You could consider doing something like the following:
>
>    1. Add a last_updated column to all your tables and update the column
>    when you change the value. Ideally monotonic across the cluster but you
>    could also go with local time and build in a "fudge factor" when reading in
>    step 2
>    2. Periodically scan the table for any changes newer than the previous
>    scan in the last_updated column. This type of scan is more efficient to do
>    in Kudu than in many other systems. With Impala you could run a query like:
>    select * from table1 where last_updated > $prev_updated;
>    3. Dump the results of this query to parquet
>    4. Use distcp to copy the parquet files over to the other cluster
>    periodically (maybe you can throttle this if needed to avoid saturating the
>    pipe)
>    5. Upsert the parquet data into Kudu on the remote end
>
> Hopefully some workaround like this would work for you until Kudu has a
> reliable streaming backup solution.
>
> Like Adar said, as an Apache project we are always open to contributions
> and it would be great to get some in this area. Please reach out if you're
> interested in collaborating on a design.
>
> Mike
>
> On Fri, Sep 22, 2017 at 10:43 AM, Adar Lieber-Dembo <adar@cloudera.com>
> wrote:
>
>> Franco,
>>
>> Thanks for the detailed description of your problem.
>>
>> I'm afraid there's no such mechanism in Kudu today. Mining the WALs seems
>> like a path fraught with land mines. Kudu GCs WAL segments aggressively so
>> I'd be worried about a listening mechanism missing out on some row
>> operations. Plus the WAL is Raft-specific as it includes both REPLICATE
>> messages (reflecting a Write RPC from a client) and COMMIT messages
>> (written out when a majority of replicas have written a REPLICATE); parsing
>> and making sense of this would be challenging. Perhaps you could build
>> something using Linux's inotify system for receiving file change
>> notifications, but again I'd be worried about missing certain updates.
>>
>> Another option is to replicate the data at the OS level. For example, you
>> could periodically rsync the entire cluster onto a standby cluster. There's
>> bound to be data loss in the event of a failover, but I don't think you'll
>> run into any corruption (though Kudu does take advantage of sparse files
>> and hole punching, so you should verify that any tool you use supports
>> that).
>>
>> Disaster Recovery is an oft-requested feature, but one that Kudu
>> developers have been unable to prioritize yet. Would you or your someone on
>> your team be interested in working on this?
>>
>> On Thu, Sep 21, 2017 at 7:12 PM Franco Venturi <fventuri@comcast.net>
>> wrote:
>>
>>> We are planning for a 50-100TB Kudu installation (about 200 tables or
>>> so).
>>>
>>> One of the requirements that we are working on is to have a secondary
>>> copy of our data in a Disaster Recovery data center in a different location.
>>>
>>>
>>> Since we are going to have inserts, updates, and deletes (for instance
>>> in the case the primary key is changed), we are trying to devise a process
>>> that will keep the secondary instance in sync with the primary one. The two
>>> instances do not have to be identical in real-time (i.e. we are not looking
>>> for synchronous writes to Kudu), but we would like to have some pretty good
>>> confidence that the secondary instance contains all the changes that the
>>> primary has up to say an hour before (or something like that).
>>>
>>>
>>> So far we considered a couple of options:
>>> - refreshing the seconday instance with a full copy of the primary one
>>> every so often, but that would mean having to transfer say 50TB of data
>>> between the two locations every time, and our network bandwidth constraints
>>> would prevent to do that even on a daily basis
>>> - having a column that contains the most recent time a row was updated,
>>> however this column couldn't be part of the primary key (because the
>>> primary key in Kudu is immutable), and therefore finding which rows have
>>> been changed every time would require a full scan of the table to be
>>> sync'd. It would also rely on the "last update timestamp" column to be
>>> always updated by the application (an assumption that we would like to
>>> avoid), and would need some other process to take into accounts the rows
>>> that are deleted.
>>>
>>>
>>> Since many of today's RDBMS (Oracle, MySQL, etc) allow for some sort of
>>> 'Change Data Capture' mechanism where only the 'deltas' are captured and
>>> applied to the secondary instance, we were wondering if there's any way in
>>> Kudu to achieve something like that (possibly mining the WALs, since my
>>> understanding is that each change gets applied to the WALs first).
>>>
>>>
>>> Thanks,
>>> Franco Venturi
>>>
>>
>

Mime
View raw message