cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13004) Corruption while adding/removing a column to/from the table
Date Thu, 29 Jun 2017 19:48:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068868#comment-16068868
] 

Jeff Jirsa edited comment on CASSANDRA-13004 at 6/29/17 7:47 PM:
-----------------------------------------------------------------

Yes, the bug impacts tables without UDTS

The bug is basically that read repairs in the ~100ms or so surrounding an ALTER that ADDs
or DROPs columns could cause cells to bleed into each other. Depending on the type of the
cells, sometimes marshaling / validation would throw an EOF or other exception to protect
you from the corruption. In cases where the adjacent fields were of the same type, that's
less likely to happen (text/text, int/int, etc).

The data on disk may be so corrupt that running a {{SELECT *}} throws an error, or it could
give back entirely real-looking CQL rows with invalid data in it.

Again, it's only for data read-repaired while the schema propagation happened (if the schema
columns from the read-repairing hosts don't match while the read repair is ongoing), so only
data read during that change can be impacted.



was (Author: jjirsa):
Yes, the bug impacts colums without UDTS

The bug is basically that read repairs in the ~100ms or so surrounding an ALTER that ADDs
or DROPs columns could cause cells to bleed into each other. Depending on the type of the
cells, sometimes marshaling / validation would throw an EOF or other exception to protect
you from the corruption. In cases where the adjacent fields were of the same type, that's
less likely to happen (text/text, int/int, etc).

The data on disk may be so corrupt that running a {{SELECT *}} throws an error, or it could
give back entirely real-looking CQL rows with invalid data in it.

Again, it's only for data read-repaired while the schema propagation happened (if the schema
columns from the read-repairing hosts don't match while the read repair is ongoing), so only
data read during that change can be impacted.


> Corruption while adding/removing a column to/from the table
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-13004
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Stanislav Vishnevskiy
>            Assignee: Alex Petrov
>            Priority: Blocker
>             Fix For: 3.0.14, 3.11.0, 4.0
>
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
>     nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
>     id bigint,
>     type int,
>     allow_ int,
>     deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
>     id bigint,
>     guild_id bigint,
>     type tinyint,
>     name text,
>     topic text,
>     position int,
>     owner_id bigint,
>     icon_hash text,
>     recipients map<bigint, frozen<channel_recipient>>,
>     permission_overwrites map<bigint, frozen<channel_permission_overwrite>>,
>     bitrate int,
>     user_limit int,
>     last_pin_timestamp timestamp,
>     last_message_id bigint,
>     PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no longer be
read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. ver(4);
flags(0000); stream(27); op(8); offset(9); len(887); buffer: '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00
\x08\x01\x00\x00\x00\x04\xc4\xb4(\x00\xff\xff\xff\xff\x00\x00\x00O[f\x80Q\x07general\x05\xf8G\xc5\xffI\x98\xc4\xb4(\x00\xf8O[f\x80Q\x00\x00\x00\x02\x04\xf8O[f\x80Q\x00\xf8G\xc5\xffI\x98\x01\x00\x00\xf8O[f\x80Q\x00\x00\x00\x00\xf8G\xc5\xffI\x97\xc4\xb4(\x06\x00\xf8O\x7fe\x1fm\x08\x03\x00\x00\x00\x01\x00\x00\x00\x00\x04\x00\x00\x00\x00'
> {code}
> And then in cqlsh when trying to read the row we got this. 
> {code:none}
> /usr/bin/cqlsh.py:632: DateOverFlowWarning: Some timestamps are larger than Python datetime
can represent. Timestamps are displayed in milliseconds from epoch.
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1301, in perform_simple_statement
>     result = future.result()
>   File "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.5.0.post0-d8d0456.zip/cassandra-driver-3.5.0.post0-d8d0456/cassandra/cluster.py",
line 3650, in result
>     raise self._final_exception
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 2: invalid start
byte
> {code}
> We tried to read the data and it would refuse to read the name column (the UTF8 error)
and the last_pin_timestamp column had an absurdly large value.
> We ended up rewriting the whole row as we had the data in another place and it fixed
the problem. However there is clearly a race condition in the schema change sub-system.
> Any ideas?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message