cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
Date Fri, 03 Jun 2016 20:15:02 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314718#comment-15314718
] 

Joshua McKenzie commented on CASSANDRA-8844:
--------------------------------------------

bq. add an acceptsCDCMutations flag and set it when the segment is created or space is freed,
reserving the capacity...
This suggestion piqued my interest. I went ahead and implemented this which further re-inforced
my general frustration surrounding trying to get deterministic behavior out of the order of
CommitLogSegment allocation (for purposes of tracking size, in our case). I moved the size
tracking logic into the CDCSizeTracker (was SizeCalculator) which I think is much cleaner
and easier to reason about.

For now, there's synchronized blocks around the adjustment of flushed and unflushed size,
primarily to try and work around potential races between the segment allocation thread and
the executor for directory walking size calculation. There's the potential for 2 races that
I can think of:
# there's a window of time between rebuildFileList and Files.walkFileTree where there could
be uncaught changes in the underlying filesystem.
# the changes in CDCSizeTracker.size() are incremental per-file while walking the file tree,
so changes to that value could race with changes from the segment management thread.

Those two being acknowledged, I think we're close to as good as we're going to get regarding
the correctness of the tracked size of the CDC folder w/out making the calc synchronous on
allocation or adding a hook for external consumers to signal C*. Another added benefit of
this approach is that it's a very simple check on mutation application with CDC-enabled.

My gut feeling is that this is a premature optimization if approached strictly for making
the CDC mutations faster, however I find the relationships more clearly defined now and easier
to reason about. Let me know what you think.

bq. CommitLogSegmentManagerCDC.discard should swap the order of removeUnflushedSize and addFlushedSize,
otherwise atCapacity may flip in-between when space isn't actually available.
Should be addressed w/new code.

bq. Stopping replay on error actually stops segment replay on error, and allows the process
to continue with the next segment. The old code didn't do anything different, but we now claim
returning true stops replay which isn't entirely correct – at the very least we should state
so in the comment, but I'd rather just remove that option. The comment should also say that
to fully stop replay one must throw an exception (as CommitLogReplayer does).
I'd rather we more deliberately change the current "skip this segment" to "forcibly terminate
reading" in a separate ticket rather than tacked along on this one. I'm fine with (and have
done) renaming the method to "shouldSkipSegmentOnError" and updated the comments accordingly.

bq. Pre-2.1 replay should also set mutationLimit on the tracker.
Good catch. Fixed.

bq. SimpleCachedBufferPool should provide getThreadLocalReusableBuffer(int size) which should
automatically reallocate if the available size is less, and not expose a setter at all.
This is a simple refactor of the existing code in use in FileDirectSegment on trunk. I'd prefer
we tackle changing its usage patterns and functionality in a separate ticket so as not to
add any further dependencies or changes into this ticket with CDC.

bq. AbstractCommitLogSegmentManager.start: You can use AbstractCommitLogSegmentManager.this
instead of explicitly saving parent.
No longer applies w/changed code, as createSegment is now a virtual w/size tracking on the
CDC side, etc.

bq. for each loop vs forEach: I'm not a big fan of these transformations. The way lambdas
are currently implemented the latter incurs an extra allocation – it is not a showstopping
inefficiency, but since that isn't unequivocally better, I wouldn't change the loops when
it doesn't significantly improve readability.
Reverted the ones in CommitLog. Largely an artifact of the multiple CLSM approach.

bq. commitLogUpperBound is replaced incorrectly in comments.
Bad rebase. Fixed.

Last but not least, in commit {{a7afe74d5c6c0c444ebc4c38c9b55f6a44a96c3a}} I modified the
CommitLogReader to suppress handleMutation calls for mutations that originate below the now
passed in CommitLogSegmentPosition specified as minimum position to the reading process. Prior
to this (and on trunk), the logic skips SyncSegments where the end position is > that then
min start, however we replay mutations in the overlapping SyncSegment regardless of whether
they are before or after our requested min position. This isn't a huge issue during traditional
CommitLogReplay, however now that we're exposing an interface with a count of mutations to
replay, we need to respect that contract and thus not read mutations nor count them against
the limit unless they're past the min.

So long as EncryptedFileSegmentInputStream doesn't implement {{FileDataInput.seek}}, we're
going to have to go through the considerably more expensive path of deserializing the mutation,
checking CRC, etc, however this only applies to mutations within the SyncSegment so should
be a relatively constrained performance penalty. This can be resolved in a follow-up ticket.

A local run of ant test-cdc has only 1 failure w/anticompaction unrelated to this ticket.
I'm debating whether or not to run this in CI until I get your feedback on these changes and
Carl's next round of feedback.

Given our time horizon (a few weeks out from freeze for 3.8), I'd prefer this be the last
sizeable change to the implementation that we make assuming no major flaws or obstructions
come up from this change.

> Change Data Capture (CDC)
> -------------------------
>
>                 Key: CASSANDRA-8844
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Coordination, Local Write-Read Paths
>            Reporter: Tupshin Harper
>            Assignee: Joshua McKenzie
>            Priority: Critical
>             Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns used to
determine (and track) the data that has changed so that action can be taken using the changed
data. Also, Change data capture (CDC) is an approach to data integration that is based on
the identification, capture and delivery of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for mission critical
data in large enterprises, it is increasingly being called upon to act as the central hub
of traffic and data flow to other systems. In order to try to address the general need, we
(cc [~brianmhess]), propose implementing a simple data logging mechanism to enable per-table
CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its Consistency Level
semantics, and in order to treat it as the single reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable (deliver-at-least-once with
possible mechanisms for deliver-exactly-once ) continuous semi-realtime feeds of mutations
going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they don't have
to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, give them
the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, with the
following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an easy enhancement,
but most likely use cases would prefer to only implement CDC logging in one (or a subset)
of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the commitlog,
failure to write to the CDC log should fail that node's write. If that means the requested
consistency level was not met, then clients *should* experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to reconstitute
rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons written in
non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also believe
that they can be deferred for a subsequent release, and to guage actual interest.
> - Multiple logs per table. This would make it easy to have multiple "subscribers" to
a single table's changes. A workaround would be to create a forking daemon listener, but that's
not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters would make
Casandra a much more versatile feeder into other systems, and again, reduce complexity that
would otherwise need to be built into the daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it triivial to process
them in order.
> - Daemons should be able to checkpoint their work, and resume from where they left off.
This means they would have to leave some file artifact in the CDC log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order, even when it is multiple logfiles behind in processing
> -- Be able to continuously "tail" the most recent logfile and get low-latency(ms?) access
to the data as it is written.
> h2. Alternate approach
> In order to make consuming a change log easy and efficient to do with low latency, the
following could supplement the approach outlined above
> - Instead of writing to a logfile, by default, Cassandra could expose a socket for a
daemon to connect to, and from which it could pull each row.
> - Cassandra would have a limited buffer for storing rows, should the listener become
backlogged, but it would immediately spill to disk in that case, never incurring large in-memory
costs.
> h2. Additional consumption possibility
> With all of the above, still relevant:
> - instead (or in addition to) using the other logging mechanisms, use CQL transport itself
as a logger.
> - Extend the CQL protoocol slightly so that rows of data can be return to a listener
that didn't explicit make a query, but instead registered itself with Cassandra as a listener
for a particular event type, and in this case, the event type would be anything that would
otherwise go to a CDC log.
> - If there is no listener for the event type associated with that log, or if that listener
gets backlogged, the rows will again spill to the persistent storage.
> h2. Possible Syntax
> {code:sql}
> CREATE TABLE ... WITH CDC LOG
> {code}
> Pros: No syntax extesions
> Cons: doesn't make it easy to capture the various permutations (i'm happy to be proven
wrong) of per-dc logging. also, the hypothetical multiple logs per table would break this
> {code:sql}
> CREATE CDC_LOG mylog ON mytable WHERE MyUdf(mycol1, mycol2) = 5 with DCs={'dc1','dc3'}
> {code}
> Pros: Expressive and allows for easy DDL management of all aspects of CDC
> Cons: Syntax additions. Added complexity, partly for features that might not be implemented



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message