cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9761) Delay auth setup until peers are upgraded
Date Mon, 07 Sep 2015 15:43:45 GMT


Sylvain Lebresne commented on CASSANDRA-9761:

For info, this is the reason for the failure of at least some upgrade dtests (typically [this|]).
 Basically, the test are issuing a truncate as their first order of business, and the 2.1
closes the connection to the other node due to this, some of the truncation acknowledgment
get losts because it's in the queue of that connection, hence ending up in a truncate timeout.

And of course there is the fact that due to this the 2.1 node logs a warning with a stack
trace, which might worry operators a bit even though nothing is wrong.

> Delay auth setup until peers are upgraded
> -----------------------------------------
>                 Key: CASSANDRA-9761
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>             Fix For: 3.0.0 rc1, 2.2.2
> The built in auth classes {{CassandraRoleManager}} and {{CassandraAuthorizer}} both attempt
to do some setup and data conversion when a node is upgraded to version 2.2 or higher. At
the moment, each node attempts the operations with the expectation that this will fail until
enough of the cluster has been upgraded for it to succeed (i.e. enough nodes have the latest
schema with the requisite new tables). These expected failures are largely harmless, but they
are annoying because they cause the receiving node (the non-upgraded node) to close the connection
with the upgraded node, which then has to be restablished. Although this is the normal behaviour
on schema disagreement (see CASSANDRA-9136 for further discussion), it may be possible to
avoid in this specific circumstance. Given that we expect the operations to fail until enough
nodes are upgraded, we could defer them until we're sure they can succeed by checking the
messaging service version of peers. 
> Right now these are a one shot thing, each node only makes one attempt at the conversion
(until it is restarted). Without investigating further, I don't know if we'd need to add in
retries in case it takes a little time for each peer's MS version to be updated as they're
upgraded. The setup & conversion operations are idempotent, so there shouldn't be a great
issue if several nodes  attempt them at the same time anyway.

This message was sent by Atlassian JIRA

View raw message