kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ananth G <ananthg.a...@gmail.com>
Subject Re: Apache Apex supports kudu as a high throughput sink
Date Tue, 30 May 2017 21:55:40 GMT
Thanks for the review comments Todd. I shall fix the wording accordingly.


> On 31 May 2017, at 5:50 am, Todd Lipcon <todd@cloudera.com> wrote:
> Hey Ananth,
> Thanks for posting this, and for working on the Kudu sink for Apex.
> One thing I wanted to note in the article:
> "Kudu output operator allows the client side timestamps to be propagated to the Kudu
server where the mutation is executed. This allows for out of sequence data tuples to be ordered
on the server side. The following snippet of code in the upstream operator shows how this
can be done."
> I think your understanding of the setPropagatedTimestamp() call is not quite right. This
timestamp propagation serves as a lower-bound for the assigned timestamp at the server side,
not as an exact setting of the server side timestamp. Thus, if you perform two inserts, and
the second insert has a lower propagated timestamp, it does _not_ ensure that the first one
takes precedence. Since the Propagated Timestamp is a lower-bound, the second insert will
still be assigned a higher timestamp than the first.
> The purpose of this advanced API is to allow causal ordering to be maintained between
two writes. For example, imagine that client A writes data from machine A, and then communicates
with client B on machine B. Then, client B performs a write. If we want to ensure that B's
write is assigned a higher timestamp than A, the setPropagatedTimestamp() API can ensure that
(by setting A's write's timestamp as the lower bound for B's write). But, it can't be used
to back-date a write as the article seems to be implying.
> Otherwise, the post is great! Thanks again for sharing your experience and application.
> -Todd
>> On Tue, May 30, 2017 at 11:33 AM, Ananth G <ananthg.apex@gmail.com> wrote:
>> Hello All,
>> Apache apex now enables low latency high throughput writes to Kudu as a sink. More
details on this on the atrato blog here: http://www.atrato.io/blog/2017/05/28/apex-kudu-output/
. Please use the comments section to provide any feedback. 
>> Regards,
>> Ananth
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

View raw message