flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From milind parikh <milindspar...@gmail.com>
Subject Re: Cassandra sink wrt Counters
Date Tue, 10 May 2016 15:36:43 GMT
Hi Chesnay

Sorry for asking the question in a confusing manner. Being new to flink,
there are many questions swirling around in my head.

Thanks for the details in your answers. Here's the facts , as I see them:

(a) Cassandra Counters are not idempotent
(b) The failures, in context of Cassandra, are not the typical failures of
an ACID transaction. The failure indicate that the operation was not able
to continue at the specified transaction level; meaning that at least one
of the nodes didn't ack back in the requisite amount of time the reads or
the writes. This failure is NOT indicative of the fact that some node (or
many ) might have seen and processed the reads or writes; just that at
least one of the nodes did not. There is no rollback either. The
antientropy features of Cassandra will kick in and attempt to correct the
situation internal to Cassandra. From an external system, though, the
situation is different....if such failure occurs, one could try to retry
the operation (specifically writes) again outside of Cassandra; provided
one has the ability to do so through an intermediate layer (think flink)and
the write is specifically modeled to be idempotent in the data model
(specifically Rowkey design).

One could model the data model so as to make Flink work exceptionally well
with Cassandra; except counter tables. There is no way in Cassandra
currently to model an idempotent counter table that I know of. Therefore an
event replay that affects a counter might end up double counting.

When will the Cassandra sink be released?  I am ready to test it out even
now.
Hello Milind,

I'm not entirely sure i fully understood your question, but I'll try anyway
:)

There is now way to provide exactly-once semantics for Cassandra's
counters. As such we (will) only provide exactly-once semantics for a
subset of Cassandra operations; idempotent inserts/updates.

There are several things that would allow exactly-once semantics:

   - transactions
      - rather obvious i think
      - replaying/rollback to a given state
      - replay for sources/rollback for sinks upon failure
      - an atomic idempotent update across 2 tables.
      - allows tracking every read/write made; selectively re-read/write
      upon failure

One of the key requisites is proper failure reporting though; if an update
fails we *need to know*. As far as i know Cassandra doesn't make this
guarantee.

Regards,
Chesnay Schepler

On 10.05.2016 07:48, milind parikh wrote:

Given FLINK 3311 & 3332, I am wondering it would be possible,  without
idempotent counters in Cassandra, to deliver on an exactly once sink into
Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user
that this is not exactly "exactly once" sink.

However my question  has to do with whether having idempotent counters and
a Data model that enables all other idempotent operations are a necessary
prerequisite to exactly once semantics in flink.

Asked a different way, what source and sink would enable a end-to-end
exactly - once semantics,  in the current state-of-the-art, with Flink in
the middle.

Thanks
Milind

Mime
View raw message