cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
Date Mon, 20 Apr 2015 14:19:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502902#comment-14502902
] 

Sylvain Lebresne commented on CASSANDRA-9136:
---------------------------------------------

It's not unreasonable per-se, but the fact that you have to manually pass how much bytes you've
deserialized when throwing the exception makes this a bit error prone in general imo, even
though it's arguably easy enough to proof check in this particular case (it would also make
it slightly more annoying to add support for {{EncodedDataInputStream}} if we wanted too for
instance, though that's a minor point).

The intial idea I had was to use something like {{BytesReadTracker}} to make the counting
automatic, but I'm married to that idea either though since it adds a small overhead in general
which I don't like.

Overall, I respect wanting to improve this but I think I'm of the opinion that simply making
the error message a lot more clear should be good enough and that it's not worth trying to
be too smart in recovering. Not a strong opinion though, just a data point.


> Improve error handling when table is queried before the schema has fully propagated
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9136
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14
>            Reporter: Russell Alexander Spitzer
>            Assignee: Tyler Hobbs
>             Fix For: 2.1.5
>
>
> This error occurs during a rolling upgrade between 2.0.14 and 2.1.4.
> h3. Repo
> With all the nodes on 2.0.14 make the following tables
> {code}
> CREATE KEYSPACE test WITH replication = {
>   'class': 'SimpleStrategy',
>   'replication_factor': '2'
> };
> USE test;
> CREATE TABLE compact (
>   k int,
>   c int,
>   d int,
>   PRIMARY KEY ((k), c)
> ) WITH COMPACT STORAGE;
> CREATE TABLE norm (
>   k int,
>   c int,
>   d int,
>   PRIMARY KEY ((k), c)
> ) ;
> {code}
> Then load some data into these tables. I used the python driver
> {code}
> from cassandra.cluster import Cluster
> s = Cluster().connect()
> for x in range (1000):
>     for y in range (1000):
>        s.execute_async("INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)"%(x,y,y))
>        s.execute_async("INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)"%(x,y,y))
> {code}
> Upgrade one node from 2.0.14 -> 2.1.4
> From the 2.1.4 node, create a new table.
> Query that table
> On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there.
 This exception kills the TCP connection between the nodes.
> {code}
> ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in
thread Thread[Thread-19,5,main]
> java.lang.NullPointerException
> 	at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247)
> 	at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156)
> 	at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
> 	at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149)
> 	at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131)
> 	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
> {code}
> Run cqlsh on the upgraded node and queries will fail until the TCP connection is established
again, easiest to repo with CL = ALL
> {code}
> cqlsh> SELECT count(*) FROM test.norm where k = 22 ;
> ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses]
message="Operation timed out - received only 1 responses." info={'received_responses': 1,
'required_responses': 2, 'consistency': 'ALL'}
> cqlsh> SELECT count(*) FROM test.norm where k = 21 ;
> ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses]
message="Operation timed out - received only 1 responses." info={'received_responses': 1,
'required_responses': 2, 'consistency': 'ALL'}
> {code}
> So connection made:
> {code}
> DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set
version for /10.240.14.115 to 8 (will use 7)
> {code}
> Connection broken by query of table before schema propagated:
> {code}
> ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception
in thread Thread[Thread-227,5,main]
> java.lang.NullPointerException
> 	at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247)
> 	at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156)
> 	at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
> 	at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149)
> 	at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131)
> 	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
> {code}
> All query to that node will fail with timeouts now until...
> Connection re-established
> {code}
> DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set
version for /10.240.14.115 to 8 (will use 7)
> {code}
> Now queries work again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message