Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 29BE22007D0 for ; Tue, 10 May 2016 17:36:49 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 27DA016098A; Tue, 10 May 2016 15:36:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 24145160877 for ; Tue, 10 May 2016 17:36:47 +0200 (CEST) Received: (qmail 11188 invoked by uid 500); 10 May 2016 15:36:47 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 11178 invoked by uid 99); 10 May 2016 15:36:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 May 2016 15:36:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B639F1801DD for ; Tue, 10 May 2016 15:36:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eitkU_NY6h4w for ; Tue, 10 May 2016 15:36:45 +0000 (UTC) Received: from mail-vk0-f53.google.com (mail-vk0-f53.google.com [209.85.213.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D6BCE5FBB5 for ; Tue, 10 May 2016 15:36:44 +0000 (UTC) Received: by mail-vk0-f53.google.com with SMTP id o133so21030159vka.0 for ; Tue, 10 May 2016 08:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=R+qN1WRXiwgHbcGMyP/kGhkwFrrEfd0irYbhp75FZ8E=; b=Ngv4VmnaZF9h8EKRwhjaD4rkqWFmdTKpN2yjbsiX5qMjdrlax8COU/XBvzIPjiZUxY e+R1zXR5rEyzTlkTBqX5Vd7ZqTOT9FLzgMQ8vt6KbLHE7V6lHtXYyk0m/w3MOjscHQY8 +moZrSatNW+SDxzx5FlsVFUdW5qtEEQSamBlxeH5Zsl8jJy0mFx6gfxh7JaZB7zlEt8z 6cx6JWjFfCAmhVKN108IhzT9SJihLwshi0W9oOW8o64XKdkihVTKnaKMxwecXqBmpSQ2 mlz4YX9KWtkkmdE+7K1UX3DLuAcobeGkZSD3yw8kiRaj+EPyoRcGX1uVVNdKBJMAOdTE 0Hgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=R+qN1WRXiwgHbcGMyP/kGhkwFrrEfd0irYbhp75FZ8E=; b=ivi3IE/taqqlZN6mevSPznDGuf4Xw/5yaVu0PLc9J3B3cJ2IjktzcifHpcTMLNC7hQ 6/hoWUOqB95IKOuOixPW9Dajxi51x8uhSv66SKIDXR8hcJSyT8vB6ofisFl6BJO0o/4l 9a8EGSWfZs0rPoYMCk8kxefjpgcJu32aI4UXjeElh7tb6mJSgoJQ16ftl0I7/G3fDjmL zKJz7PI4AMVdQQNzhmK+a9mK9DTClXi1g0DQBxbsHA1neGQLxFExQx2X5bliUyxwrnLp NCA3X7PVdB7NRvTE9noUYmEvn8rET1jEAsM1Cnrj6s7IJuT+i3k342yKolrlHWoWWbfL tazw== X-Gm-Message-State: AOPr4FUK8IclSLnnhKoKF52oS1rhujZfSJWnU9QzAfI4wr/GYNl5R86oifNUJ+Z69q0GPraVA5xpAjLHWUWrXg== MIME-Version: 1.0 X-Received: by 10.31.59.205 with SMTP id i196mr10293638vka.76.1462894604225; Tue, 10 May 2016 08:36:44 -0700 (PDT) Received: by 10.159.39.74 with HTTP; Tue, 10 May 2016 08:36:43 -0700 (PDT) Received: by 10.159.39.74 with HTTP; Tue, 10 May 2016 08:36:43 -0700 (PDT) In-Reply-To: <5731A3B6.2070401@apache.org> References: <5731A3B6.2070401@apache.org> Date: Tue, 10 May 2016 08:36:43 -0700 Message-ID: Subject: Re: Cassandra sink wrt Counters From: milind parikh To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a114304563a931405327eb197 archived-at: Tue, 10 May 2016 15:36:49 -0000 --001a114304563a931405327eb197 Content-Type: text/plain; charset=UTF-8 Hi Chesnay Sorry for asking the question in a confusing manner. Being new to flink, there are many questions swirling around in my head. Thanks for the details in your answers. Here's the facts , as I see them: (a) Cassandra Counters are not idempotent (b) The failures, in context of Cassandra, are not the typical failures of an ACID transaction. The failure indicate that the operation was not able to continue at the specified transaction level; meaning that at least one of the nodes didn't ack back in the requisite amount of time the reads or the writes. This failure is NOT indicative of the fact that some node (or many ) might have seen and processed the reads or writes; just that at least one of the nodes did not. There is no rollback either. The antientropy features of Cassandra will kick in and attempt to correct the situation internal to Cassandra. From an external system, though, the situation is different....if such failure occurs, one could try to retry the operation (specifically writes) again outside of Cassandra; provided one has the ability to do so through an intermediate layer (think flink)and the write is specifically modeled to be idempotent in the data model (specifically Rowkey design). One could model the data model so as to make Flink work exceptionally well with Cassandra; except counter tables. There is no way in Cassandra currently to model an idempotent counter table that I know of. Therefore an event replay that affects a counter might end up double counting. When will the Cassandra sink be released? I am ready to test it out even now. Hello Milind, I'm not entirely sure i fully understood your question, but I'll try anyway :) There is now way to provide exactly-once semantics for Cassandra's counters. As such we (will) only provide exactly-once semantics for a subset of Cassandra operations; idempotent inserts/updates. There are several things that would allow exactly-once semantics: - transactions - rather obvious i think - replaying/rollback to a given state - replay for sources/rollback for sinks upon failure - an atomic idempotent update across 2 tables. - allows tracking every read/write made; selectively re-read/write upon failure One of the key requisites is proper failure reporting though; if an update fails we *need to know*. As far as i know Cassandra doesn't make this guarantee. Regards, Chesnay Schepler On 10.05.2016 07:48, milind parikh wrote: Given FLINK 3311 & 3332, I am wondering it would be possible, without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink. However my question has to do with whether having idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink. Asked a different way, what source and sink would enable a end-to-end exactly - once semantics, in the current state-of-the-art, with Flink in the middle. Thanks Milind --001a114304563a931405327eb197 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi Chesnay

Sorry for asking the question in a confusing manner. Being n= ew to flink, there are many questions swirling around in my head.

Thanks for the details in your answers. Here's the facts= , as I see them:

(a) Cassandra Counters are not idempotent
(b) The failures, in context of Cassandra, are not the typical failures of = an ACID transaction. The failure indicate that the operation was not able t= o continue at the specified transaction level; meaning that at least one of= the nodes didn't ack back in the requisite amount of time the reads or= the writes. This failure is NOT indicative of the fact that some node (or = many ) might have seen and processed the reads or writes; just that at leas= t one of the nodes did not. There is no rollback either. The antientropy fe= atures of Cassandra will kick in and attempt to correct the situation inter= nal to Cassandra. From an external system, though, the situation is differe= nt....if such failure occurs, one could try to retry the operation (specifi= cally writes) again outside of Cassandra; provided one has the ability to d= o so through an intermediate layer (think flink)and the write is specifical= ly modeled to be idempotent in the data model (specifically Rowkey design).=

One could model the data model so as to make Flink work exce= ptionally well with Cassandra; except counter tables. There is no way in Ca= ssandra currently to model an idempotent counter table that I know of. Ther= efore an event replay that affects a counter might end up double counting.<= /p>

When will the Cassandra sink be released?=C2=A0 I am ready t= o test it out even now.

=20 =20 =20
Hello Milind,

I'm not entirely sure i fully understood your question, but I'= ;ll try anyway :)

There is now way to provide exactly-once semantics for Cassandra'= s counters. As such we (will) only provide exactly-once semantics for a subset of Cassandra operations; idempotent inserts/updates.

There are several things that would allow exactly-once semantics:
  • transactions
    • rather obvious i think
  • replaying/rollback to a given state
    • replay for sources/rollback for sinks upon failure
  • an atomic idempotent update across 2 tables.
    • allows tracking every read/write made; selectively re-read/write upon failure
One of the key requisites is proper failure reporting though; if an update fails we need to know. As far as i know Cassandra doesn't make this guarantee.

Regards,
Chesnay Schepler

On 10.05.2016 07:48, milind parikh wrote:

Given FLINK 3311 & 3332, I am wondering it would be possible,=C2=A0 without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink.

However my question=C2=A0 has to do with whether havi= ng idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink.

Asked a different way, what source and sink would enable a end-to-end exactly - once semantics,=C2=A0 in the current state-of-the-art, with Flink in the middle.

Thanks
Milind


--001a114304563a931405327eb197--