cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state
Date Mon, 23 Mar 2015 21:43:52 GMT


Joshua McKenzie commented on CASSANDRA-8984:

All unit tests are timing out on Windows - attaching a sample of one to this ticket. I've
reviewed most of the patch and will leave feedback for what I have thus far - I still need
to digest the changes to SSTableWriter as they're fairly extensive.

* I'm not too keen on prepareToCommit being part of the usage pattern without actually being
present in the interface but I can see why you went that route. I don't have any alternative
suggestions unfortunately.
* The passing around of Throwables to merge deep in the stack (Transactional, SafeMemory*,
Ref* etc) is a little cludgy and has some pretty deeply nested tag-along variables. Not sure
why we can't just return a Throwable up that stack and merge at the top level when we know
we might need to merge rather than passing the Throwable all the way down from the top to
merge at the bottom...?
* The StateManager abstraction and the Transactional Interface seem to be a poor fit to several
of the implementers. Having 2/3 of the methods resolve to the noOpTransition seems like we're
conflating the idea of "classes that have resources that need to be cleaned up" with "classes
that have a set of state transitions they go through and a logical abort process".
* With regard to the StateManager requiring a beginTransition / rejectedTransition combo and
specific completeTransition - we're trading one set of manually managed states for another.
Still error-prone and has quite a bit of duplication where it's implemented (rather than noOpTransition)
* autoclose seems to have some redundant assignment - we switch on state and if it's COMMITTED,
we set state = COMMITTED, ABORTED we set it to ABORTED.
* Consider renaming StateManager.autoclose(). Something like 'finalize()' might be more accurate,
as 'state.autoclose()' describes the context in which it's called rather than what it's doing.

* Inconsistent prepareForCommit vs. prepareToCommit in comment in Transactional
* Unused Logger added to IndexSummary -> was this intentional?
* You left a comment in SSTRW.prepareToCommitAndMaybeThrow that should be removed:
// No early open to finalize and replace

In general this patch and the recent trend in our code-base on the 2.1+ branches makes me
uneasy. Moving the state tracking logic from within the SSTRW and SSTW into their own abstraction
helps separate our concerns and increase modularity at the cost of increased complexity w/regards
to the depth of the type system and object interaction, similarly to the introduction of the
formalized ref-counting infrastructure. Each additional step we've take to shore up our stability
w/regards to SSTable lifecycles is increasing our net complexity and the contrast between
where we started and where we are now is pretty striking. Now, that's not to say that I prefer
the alternative of being back where we started with regards to having an error-prone brittle
interface for ref-counting for instance, but in general I'm left feeling wary when I see more
wide-spread changes in the same vein particularly as we're approaching a .4 release on 2.1.

Note: When I refer to wide-spread changes, I have no hard and fast rule as to what qualifies
however this change touches [many files|].

As we're not attempting to address any current pain-point with this ticket, there's the outstanding
potential missed close() w/regards to commit(), and this commit rewrites some of the hotter
areas in SSTRW w/regards to recent errors, I'm of the opinion this change is better targeted
towards 3.0 similarly to CASSANDRA-8568. This would give us more time to beef up our testing
infrastructure to better test changes in these portions of the code-base that are historically
vulnerable to races w/regards to state and would also give the rest of the developers working
on the code-base more time to get familiar with these changes rather than having them out
in the wild immediately.

So all that being said, on the whole I believe this approach is a net improvement and once
we get the details hammered out I believe this will be harder to get wrong compared to our
previous implementation. The operations it's modifying are subtle enough that I don't feel
like 2.1 is the right place for it at this time though (this could be the whole "Once bitten,
twice shy" problem though...)

I'll dig into SSTW and update the ticket when I've gone through that.

> Introduce Transactional API for behaviours that can corrupt system state
> ------------------------------------------------------------------------
>                 Key: CASSANDRA-8984
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>         Attachments: 8984_windows_timeout.txt
> As a penultimate (and probably final for 2.1, if we agree to introduce it there) round
of changes to the internals managing sstable writing, I've introduced a new API called "Transactional"
that I hope will make it much easier to write correct behaviour. As things stand we conflate
a lot of behaviours into methods like "close" - the recent changes unpicked some of these,
but didn't go far enough. My proposal here introduces an interface designed to support four
actions (on top of their normal function):
> * prepareToCommit
> * commit
> * abort
> * cleanup
> In normal operation, once we have finished constructing a state change we call prepareToCommit;
once all such state changes are prepared, we call commit. If at any point everything fails,
abort is called. In _either_ case, cleanup is called at the very last.
> These transactional objects are all AutoCloseable, with the behaviour being to rollback
any changes unless commit has completed successfully.
> The changes are actually less invasive than it might sound, since we did recently introduce
abort in some places, as well as have commit like methods. This simply formalises the behaviour,
and makes it consistent between all objects that interact in this way. Much of the code change
is boilerplate, such as moving an object into a try-declaration, although the change is still
non-trivial. What it _does_ do is eliminate a _lot_ of special casing that we have had since
2.1 was released. The data tracker API changes and compaction leftover cleanups should finish
the job with making this much easier to reason about, but this change I think is worthwhile
considering for 2.1, since we've just overhauled this entire area (and not released these
changes), and this change is essentially just the finishing touches, so the risk is minimal
and the potential gains reasonably significant.

This message was sent by Atlassian JIRA

View raw message