cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13004) Corruption while adding a column to a table
Date Thu, 01 Jun 2017 10:19:04 GMT


Alex Petrov commented on CASSANDRA-13004:

I've prepared a preliminary patch. Unfortunately, because the column filter was changing so
much (also, semantically), we all three patches are slightly different. 

To recap, what was the problem and how it is proposed to mitigate it:

If the coordinator has a different view of schema than the replica, there was a single case
when the replica was still relying on it's own local {{metadata.allColumns}}. Problem with
that is that is that deserialisation header then relies on that output and, because the bitset
encoding is used, can end up grabbing a column with a different index (or even fewer columns).
This leads to the problem that replica sends a result for one columnset and the coordinator
is trying to parse it for the other columnset. Sometimes, if data is right, this leads to
the exceptions: {{java.lang.ArrayIndexOutOfBoundsException: 82}}, {{
Corrupt (negative) value length encountered}}, {{ Corrupt
flags value for unfiltered partition (isStatic flag set): 252}}, {{
EOF after 16254 bytes out of 262147}}, {{java.lang.AssertionError: deletedAt=-9223372036854775808,
localDeletion=2147483647}} (listing them here to simplify the search) to name a few, _all
depending on the data, timestamps and many other factors_. 

The other problem was on the coordinator itself, since there were cases when {{ColumnFilter}}
was serialised and sent with one set of columns and then, by the time response came back from
replica, 3.x cfmetadata would get updated, and subsequent call to {{fetched}} and {{queried}}
that would be used for deserialisation on coordinator would have a different view of columns
from the request that's been sent. This is the case when the coordinator learns about the
schema change quicker than replica. This was fixed by making sure we do not use a reference
to metadata in column filter after it's creation.


> Corruption while adding a column to a table
> -------------------------------------------
>                 Key: CASSANDRA-13004
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Stanislav Vishnevskiy
>            Assignee: Alex Petrov
>            Priority: Blocker
>             Fix For: 3.0.x, 3.11.x, 4.x
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
>     nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
>     id bigint,
>     type int,
>     allow_ int,
>     deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
>     id bigint,
>     guild_id bigint,
>     type tinyint,
>     name text,
>     topic text,
>     position int,
>     owner_id bigint,
>     icon_hash text,
>     recipients map<bigint, frozen<channel_recipient>>,
>     permission_overwrites map<bigint, frozen<channel_permission_overwrite>>,
>     bitrate int,
>     user_limit int,
>     last_pin_timestamp timestamp,
>     last_message_id bigint,
>     PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no longer be
read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. ver(4);
flags(0000); stream(27); op(8); offset(9); len(887); buffer: '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00
> {code}
> And then in cqlsh when trying to read the row we got this. 
> {code:none}
> /usr/bin/ DateOverFlowWarning: Some timestamps are larger than Python datetime
can represent. Timestamps are displayed in milliseconds from epoch.
> Traceback (most recent call last):
>   File "/usr/bin/", line 1301, in perform_simple_statement
>     result = future.result()
>   File "/usr/share/cassandra/lib/",
line 3650, in result
>     raise self._final_exception
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 2: invalid start
> {code}
> We tried to read the data and it would refuse to read the name column (the UTF8 error)
and the last_pin_timestamp column had an absurdly large value.
> We ended up rewriting the whole row as we had the data in another place and it fixed
the problem. However there is clearly a race condition in the schema change sub-system.
> Any ideas?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message