cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11142) Confusing error message on schema updates when nodes are down
Date Thu, 17 Mar 2016 16:08:33 GMT


Sylvain Lebresne commented on CASSANDRA-11142:

I do think in the case we should have the warning, but not the {{OperationTimedOut}} line.
Because it's not the schema query that timeouted, it's (I assume) the schema agreement check
done by the underlying python driver (plus, the warning is really enough).

Now, I had a quick check of the code, and currently cqlsh has no good way to know that the
timeout for the operation it just executed was not due to the query itself (also, cqlsh ends
up doing it's own schema agreement check on every timeout which is kind of inefficient/ugly
when the operation is not schema related). Ideally, since cqlsh knows when it executes a schema
statement or not, it could ask the driver to not do its internal agreement check and cqlsh
would do it itself, thus being able to know what to print/not print.

I'm not familiar enough with cqlsh and the python driver to do that easily though, so if someone
more familiar want to have a shot at it, be my guest.

> Confusing error message on schema updates when nodes are down
> -------------------------------------------------------------
>                 Key: CASSANDRA-11142
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: PROD
>            Reporter: Anubhav Kale
>            Priority: Minor
> Repro steps are as follows (this was tested on Windows and is a consistent repro)
> . Start a two node cluster.
> . Ensure that "nodetool status" shows both nodes as UN on both nodes
> . Stop Node2
> . Ensure that "nodetool status" shows that Node2 in DN.
> . Start cqlsh on Node1
> . Create a table
> . cqlsh times out with below message (coming from .py)
> Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this
is not the case, check the schema versions of your nodes in system.local and system.peers.
> OperationTimedOut: errors={}, last_host=
> . Do a select * on the table that just timed out. It works fine.
> It just seems odd that there are no errors, but the table gets created fine. We should
either fix the timeout exception with a real error or not throw timeout. Not sure what the
best approach is.

This message was sent by Atlassian JIRA

View raw message