cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13323) IncomingTcpConnection closed due to one bad message
Date Mon, 13 Mar 2017 08:43:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907017#comment-15907017
] 

Sylvain Lebresne commented on CASSANDRA-13323:
----------------------------------------------

Pretty sure this patch is not going to work. When you get the {{UnknownColumnFamilyException}},
only a sub-part of the message has been deserialized, so trying to deserialize further message
on that connection is going to get (what looks like) garbage. This is, in fact, why we currently
just throw out the connection, it's the simplest safest thing to do.

This doesn't mean btw that we couldn't have way to resume on failed message (at lest when
we know the failure is not due to a corrupted stream like in this particular case), but it's
a bit more involved. The simplest somewhat-generic solution I see fwiv would be to wrap the
DataInput into one that counts how many bytes are deserialized. We'd reset the counter at
the beginning of each payload and on an exception, we'd know how many bytes we have to skip
to resume reading to the next message properly.

> IncomingTcpConnection closed due to one bad message
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13323
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>             Fix For: 3.0.13
>
>         Attachments: CASSANDRA-13323-v1.patch
>
>
> We got this exception:
> {code}
> WARN  [MessagingService-Incoming-/****] 2017-02-14 17:33:33,177 IncomingTcpConnection.java:101
- UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 2a3ab630-df74-11e6-9f81-b56251e1559e.
If a table was just created, this is likely due to the schema not being fully propagated.
 Please wait for schema agreement on table creation.
>     at org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:131)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:113)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
~[apache-cassandra-3.0.10.jar:3.0.10]
>     at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
~[apache-cassandra-3.0.10.jar:3.0.10]
> {code}
> Also we saw this log in another host indicating it needs to re-connect:
> {code}
> INFO  [HANDSHAKE-/****] 2017-02-21 13:37:50,216 OutboundTcpConnection.java:515 - Handshaking
version with /****
> {code}
> The reason is that the node was receiving hinted data for a dropped table. This may happen
with other messages as well. On Cassandra side, IncomingTcpConnection shouldn't close on just
one bad message, even though it will be restarted soon later by SocketThread in MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message