Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@kudu.apache.org
MIME-Version: 1.0
In-Reply-To: <CAJLbxRb74XXm2KG0Hy4b-TP3ccFyHXmnnWan+BKvW=Usw1Ze7g@mail.gmail.com>
References: <684978113.13652286.1506046223155.JavaMail.zimbra@comcast.net>
 <112486188.13652985.1506046321744.JavaMail.zimbra@comcast.net>
 <CAMcOB6Ohm1WhkR-c=FGHJyiMnz2PxT6kaF1r5mXRVBx=MB=h9Q@mail.gmail.com> <CAJLbxRb74XXm2KG0Hy4b-TP3ccFyHXmnnWan+BKvW=Usw1Ze7g@mail.gmail.com>
From: Mike Percy <mpercy@apache.org>
Date: Fri, 22 Sep 2017 14:32:41 -0700
Message-ID: <CAJLbxRbeQpC21DTM=_4VMnG=oMFMXJRzUQaWgO8tLFtKoyL7zg@mail.gmail.com>
Subject: Re: Change Data Capture (CDC) with Kudu
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="001a113ec75a4c7dd50559cdf53f"
archived-at: Fri, 22 Sep 2017 21:33:26 -0000

--001a113ec75a4c7dd50559cdf53f
Content-Type: text/plain; charset="UTF-8"

Franco,
I just realized that I suggested something you mentioned in your initial
email. My mistake for not reading through to the end. It is probably the
least-worst approach right now and it's probably what I would do if I were
you.

Mike

On Fri, Sep 22, 2017 at 2:29 PM, Mike Percy <mpercy@apache.org> wrote:

> CDC is something that I would like to see in Kudu but we aren't there yet
> with the underlying support in the Raft Consensus implementation. Once we
> have higher availability re-replication support (KUDU-1097) we will be a
> bit closer for a solution involving traditional WAL streaming to an
> external consumer because we will have support for non-voting replicas. But
> there would still be plenty of work to do to support CDC after that, at
> least from an API perspective as well as a WAL management perspective (how
> long to keep old log files).
>
> That said, what you really are asking for is a streaming backup solution,
> which may or may not use the same mechanism (unfortunately it's not
> designed or implemented yet).
>
> As an alternative to Adar's suggestions, a reasonable option for you at
> this time may be an incremental backup. It takes a little schema design to
> do it, though. You could consider doing something like the following:
>
>    1. Add a last_updated column to all your tables and update the column
>    when you change the value. Ideally monotonic across the cluster but you
>    could also go with local time and build in a "fudge factor" when reading in
>    step 2
>    2. Periodically scan the table for any changes newer than the previous
>    scan in the last_updated column. This type of scan is more efficient to do
>    in Kudu than in many other systems. With Impala you could run a query like:
>    select * from table1 where last_updated > $prev_updated;
>    3. Dump the results of this query to parquet
>    4. Use distcp to copy the parquet files over to the other cluster
>    periodically (maybe you can throttle this if needed to avoid saturating the
>    pipe)
>    5. Upsert the parquet data into Kudu on the remote end
>
> Hopefully some workaround like this would work for you until Kudu has a
> reliable streaming backup solution.
>
> Like Adar said, as an Apache project we are always open to contributions
> and it would be great to get some in this area. Please reach out if you're
> interested in collaborating on a design.
>
> Mike
>
> On Fri, Sep 22, 2017 at 10:43 AM, Adar Lieber-Dembo <adar@cloudera.com>
> wrote:
>
>> Franco,
>>
>> Thanks for the detailed description of your problem.
>>
>> I'm afraid there's no such mechanism in Kudu today. Mining the WALs seems
>> like a path fraught with land mines. Kudu GCs WAL segments aggressively so
>> I'd be worried about a listening mechanism missing out on some row
>> operations. Plus the WAL is Raft-specific as it includes both REPLICATE
>> messages (reflecting a Write RPC from a client) and COMMIT messages
>> (written out when a majority of replicas have written a REPLICATE); parsing
>> and making sense of this would be challenging. Perhaps you could build
>> something using Linux's inotify system for receiving file change
>> notifications, but again I'd be worried about missing certain updates.
>>
>> Another option is to replicate the data at the OS level. For example, you
>> could periodically rsync the entire cluster onto a standby cluster. There's
>> bound to be data loss in the event of a failover, but I don't think you'll
>> run into any corruption (though Kudu does take advantage of sparse files
>> and hole punching, so you should verify that any tool you use supports
>> that).
>>
>> Disaster Recovery is an oft-requested feature, but one that Kudu
>> developers have been unable to prioritize yet. Would you or your someone on
>> your team be interested in working on this?
>>
>> On Thu, Sep 21, 2017 at 7:12 PM Franco Venturi <fventuri@comcast.net>
>> wrote:
>>
>>> We are planning for a 50-100TB Kudu installation (about 200 tables or
>>> so).
>>>
>>> One of the requirements that we are working on is to have a secondary
>>> copy of our data in a Disaster Recovery data center in a different location.
>>>
>>>
>>> Since we are going to have inserts, updates, and deletes (for instance
>>> in the case the primary key is changed), we are trying to devise a process
>>> that will keep the secondary instance in sync with the primary one. The two
>>> instances do not have to be identical in real-time (i.e. we are not looking
>>> for synchronous writes to Kudu), but we would like to have some pretty good
>>> confidence that the secondary instance contains all the changes that the
>>> primary has up to say an hour before (or something like that).
>>>
>>>
>>> So far we considered a couple of options:
>>> - refreshing the seconday instance with a full copy of the primary one
>>> every so often, but that would mean having to transfer say 50TB of data
>>> between the two locations every time, and our network bandwidth constraints
>>> would prevent to do that even on a daily basis
>>> - having a column that contains the most recent time a row was updated,
>>> however this column couldn't be part of the primary key (because the
>>> primary key in Kudu is immutable), and therefore finding which rows have
>>> been changed every time would require a full scan of the table to be
>>> sync'd. It would also rely on the "last update timestamp" column to be
>>> always updated by the application (an assumption that we would like to
>>> avoid), and would need some other process to take into accounts the rows
>>> that are deleted.
>>>
>>>
>>> Since many of today's RDBMS (Oracle, MySQL, etc) allow for some sort of
>>> 'Change Data Capture' mechanism where only the 'deltas' are captured and
>>> applied to the secondary instance, we were wondering if there's any way in
>>> Kudu to achieve something like that (possibly mining the WALs, since my
>>> understanding is that each change gets applied to the WALs first).
>>>
>>>
>>> Thanks,
>>> Franco Venturi
>>>
>>
>

--001a113ec75a4c7dd50559cdf53f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Franco,<div>I just realized that I suggested something you=
 mentioned in your initial email. My mistake for not reading through to the=
 end. It is probably the least-worst approach right now and it&#39;s probab=
ly what I would do if I were you.<div><br></div><div>Mike</div></div></div>=
<div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Sep 22, 2=
017 at 2:29 PM, Mike Percy <span dir=3D"ltr">&lt;<a href=3D"mailto:mpercy@a=
pache.org" target=3D"_blank">mpercy@apache.org</a>&gt;</span> wrote:<br><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex"><div dir=3D"ltr">CDC is something that I would =
like to see in Kudu but we aren&#39;t there yet with the underlying support=
 in the Raft Consensus implementation. Once we have higher availability re-=
replication support (KUDU-1097) we will be a bit closer for a solution invo=
lving traditional WAL streaming to an external consumer because we will hav=
e support for non-voting replicas. But there would still be plenty of work =
to do to support CDC after that, at least from an API perspective as well a=
s a WAL management perspective (how long to keep old log files).<div><br></=
div><div>That said, what you really are asking for is a streaming backup so=
lution, which may or may not use the same mechanism (unfortunately it&#39;s=
 not designed or implemented yet).</div><div><br></div><div>As an alternati=
ve to Adar&#39;s suggestions, a reasonable option for you at this time may =
be an incremental backup. It takes a little schema design to do it, though.=
 You could consider doing something like the following:</div><div><ol><li>A=
dd a last_updated column to all your tables and update the column when you =
change the value. Ideally monotonic across the cluster but you could also g=
o with local time and build in a &quot;fudge factor&quot; when reading in s=
tep 2</li><li>Periodically scan the table for any changes newer than the pr=
evious scan in the last_updated column. This type of scan is more efficient=
 to do in Kudu than in many other systems. With Impala you could run a quer=
y like: select * from table1 where last_updated &gt; $prev_updated;</li><li=
>Dump the results of this query to parquet</li><li>Use distcp to copy the p=
arquet files over to the other cluster periodically (maybe you can throttle=
 this if needed to avoid saturating the pipe)</li><li>Upsert the parquet da=
ta into Kudu on the remote end</li></ol><div>Hopefully some workaround like=
 this would work for you until Kudu has a reliable streaming backup solutio=
n.</div></div><div><br></div><div>Like Adar said, as an Apache project we a=
re always open to contributions and it would be great to get some in this a=
rea. Please reach out if you&#39;re interested in collaborating on a design=
.</div><span class=3D"HOEnZb"><font color=3D"#888888"><div><br></div><div>M=
ike</div></font></span></div><div class=3D"HOEnZb"><div class=3D"h5"><div c=
lass=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Sep 22, 2017 at=
 10:43 AM, Adar Lieber-Dembo <span dir=3D"ltr">&lt;<a href=3D"mailto:adar@c=
loudera.com" target=3D"_blank">adar@cloudera.com</a>&gt;</span> wrote:<br><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div><div dir=3D"auto">Franco,</div><div dir=
=3D"auto"><br></div><div dir=3D"auto">Thanks for the detailed description o=
f your problem.</div><div dir=3D"auto"><br></div><div dir=3D"auto">I&#39;m =
afraid there&#39;s no such mechanism in Kudu today. Mining the WALs seems l=
ike a path fraught with land mines. Kudu GCs WAL segments aggressively so I=
&#39;d be worried about a listening mechanism missing out on some row opera=
tions. Plus the WAL is Raft-specific as it includes both REPLICATE messages=
 (reflecting a Write RPC from a client) and COMMIT messages (written out wh=
en a majority of replicas have written a REPLICATE); parsing and making sen=
se of this would be challenging. Perhaps you could build something using Li=
nux&#39;s inotify system for receiving file change notifications, but again=
 I&#39;d be worried about missing certain updates.</div><div dir=3D"auto"><=
br></div><div dir=3D"auto">Another option is to replicate the data at the O=
S level. For example, you could periodically rsync the entire cluster onto =
a standby cluster. There&#39;s bound to be data loss in the event of a fail=
over, but I don&#39;t think you&#39;ll run into any corruption (though Kudu=
 does take advantage of sparse files and hole punching, so you should verif=
y that any tool you use supports that).</div><div dir=3D"auto"><br></div><d=
iv dir=3D"auto">Disaster Recovery is an oft-requested feature, but one that=
 Kudu developers have been unable to prioritize yet. Would you or your some=
one on your team be interested in working on this?</div><div><div class=3D"=
m_6879937820718607992h5"><br><div class=3D"gmail_quote"><div>On Thu, Sep 21=
, 2017 at 7:12 PM Franco Venturi &lt;<a href=3D"mailto:fventuri@comcast.net=
" target=3D"_blank">fventuri@comcast.net</a>&gt; wrote:<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex"><div><div style=3D"font-family:Arial;font-size:12pt;co=
lor:#000000"><div><p style=3D"margin:0px">We are planning for a 50-100TB Ku=
du installation (about 200 tables or so).</p><p style=3D"margin:0px">One of=
 the requirements that we are working on is to have a secondary copy of our=
 data in a Disaster Recovery data center in a different location.</p><p sty=
le=3D"margin:0px"><br></p><p style=3D"margin:0px">Since we are going to hav=
e inserts, updates, and deletes (for instance in the case the primary key i=
s changed), we are trying to devise a process that will keep the secondary =
instance in sync with the primary one. The two instances do not have to be =
identical in real-time (i.e. we are not looking for synchronous writes to K=
udu), but we would like to have some pretty good confidence that the second=
ary instance contains all the changes that the primary has up to say an hou=
r before (or something like that).</p><p style=3D"margin:0px"><br></p><p st=
yle=3D"margin:0px">So far we considered a couple of options:<br>- refreshin=
g the seconday instance with a full copy of the primary one every so often,=
 but that would mean having to transfer say 50TB of data between the two lo=
cations every time, and our network bandwidth constraints would prevent to =
do that even on a daily basis<br>- having a column that contains the most r=
ecent time a row was updated, however this column couldn&#39;t be part of t=
he primary key (because the primary key in Kudu is immutable), and therefor=
e finding which rows have been changed every time would require a full scan=
 of the table to be sync&#39;d. It would also rely on the &quot;last update=
 timestamp&quot; column to be always updated by the application (an assumpt=
ion that we would like to avoid), and would need some other process to take=
 into accounts the rows that are deleted.</p><p style=3D"margin:0px"><br></=
p><p style=3D"margin:0px">Since many of today&#39;s RDBMS (Oracle, MySQL, e=
tc) allow for some sort of &#39;Change Data Capture&#39; mechanism where on=
ly the &#39;deltas&#39; are captured and applied to the secondary instance,=
 we were wondering if there&#39;s any way in Kudu to achieve something like=
 that (possibly mining the WALs, since my understanding is that each change=
 gets applied to the WALs first).</p><p style=3D"margin:0px"><br></p><p sty=
le=3D"margin:0px">Thanks,<br>Franco Venturi</p></div></div></div></blockquo=
te></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a113ec75a4c7dd50559cdf53f--