cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Tolbert (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10822) SSTable data loss when upgrading with row tombstone present
Date Mon, 07 Dec 2015 03:33:10 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Tolbert updated CASSANDRA-10822:
-------------------------------------
    Description: 
I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also cassandra-3.0 branch)
where subsequent rows were lost within a partition where there is a row tombstone present.

Here's a scenario that reproduces the issue.

Using ccm create a single node cluster at 2.1.11:

{{ccm create -n 1 -v 2.1.11 -s financial}}

Run the following queries to create schema, populate some data and then delete some data for
november:

{noformat}
drop keyspace if exists financial;

create keyspace if not exists financial with replication = {'class': 'SimpleStrategy', 'replication_factor'
: 1 };

create table if not exists financial.symbol_history (
  symbol text,
  name text static,
  year int,
  month int,
  day int,
  volume bigint,
  close double,
  open double,
  low double,
  high double,
  primary key((symbol, year), month, day)
) with CLUSTERING ORDER BY (month desc, day desc);

insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 1, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 2, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 3, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 4, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 5, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 6, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 7, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 8, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 9, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 10, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 11, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 12, 1, 100);

delete from financial.symbol_history where symbol='CORP' and year = 2004 and month=11;
{noformat}

Flush and run sstable2json on the sole Data.db file:

{noformat}
ccm node1 flush
sstable2json /path/to/file.db
{noformat}

The output should look like the following:

{code}
[
{"key": "CORP:2004",
 "cells": [["::name","MegaCorp",1449457517033030],
           ["12:1:","",1449457517033030],
           ["12:1:volume","100",1449457517033030],
           ["11:_","11:!",1449457564983269,"t",1449457564],
           ["10:1:","",1449457516313738],
           ["10:1:volume","100",1449457516313738],
           ["9:1:","",1449457516310205],
           ["9:1:volume","100",1449457516310205],
           ["8:1:","",1449457516235664],
           ["8:1:volume","100",1449457516235664],
           ["7:1:","",1449457516233535],
           ["7:1:volume","100",1449457516233535],
           ["6:1:","",1449457516231458],
           ["6:1:volume","100",1449457516231458],
           ["5:1:","",1449457516228307],
           ["5:1:volume","100",1449457516228307],
           ["4:1:","",1449457516225415],
           ["4:1:volume","100",1449457516225415],
           ["3:1:","",1449457516222811],
           ["3:1:volume","100",1449457516222811],
           ["2:1:","",1449457516220301],
           ["2:1:volume","100",1449457516220301],
           ["1:1:","",1449457516210758],
           ["1:1:volume","100",1449457516210758]]}
]
{code}

Prepare for upgrade

{noformat}
ccm node1 nodetool snapshot financial
ccm node1 nodetool drain
ccm node1 stop
{noformat}

Upgrade to cassandra-3.0 and start the node

{noformat}
ccm node1 setdir -v git:cassandra-3.0
ccm node1 start
{noformat}

Run command in cqlsh and observe only 1 row is returned!  It appears that all data following
november is gone.

{noformat}
cqlsh> select * from financial.symbol_history;

 symbol | year | month | day | name     | close | high | low  | open | volume
--------+------+-------+-----+----------+-------+------+------+------+--------
   CORP | 2004 |    12 |   1 | MegaCorp |  null | null | null | null |    100
{noformat}

Upgrade sstables and query again and you'll observe the same problem.

{noformat}
ccm node1 nodetool upgradesstables financial
{noformat}

I modified the 2.2 version of sstable2json so that it works with 3.0 (couldn't help myself
:)), and observed 2 RangeTombstoneBoundMarker occurrences for 1 delete and the rest of the
data missing.

{code}
[
{
 "key": "CORP:2004",
 "static": {
  "cells": {
    ["name","MegaCorp",1449457517033030]
  }
 },
 "rows": [
  {
   "clustering": {"month": "12", "day": "1"},
   "cells": {
     ["volume","100",1449457517033030]
   }
  },
  {
   "tombstone": ["11:*",1449457564983269,"t",1449457564]
  },
  {
   "tombstone": ["11:*",1449457564983269,"t",1449457564]
  }
 ]
}
]
{code}

I'm not sure why this is happening, but I should point out that I'm using static columns here
and that I'm using reverse order for my clustering, so maybe that makes a difference.  I'll
try without static columns / regular ordering to see if that makes a difference and update
the ticket.

  was:
I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also cassandra-3.0 branch)
where subsequent rows were lost within a partition where there is a row tombstone present.

Here's a scenario that reproduces the issue.

Using ccm create a single node cluster at 2.1.11:

{{ccm create -n 1 -v 2.1.11 -s financial}}

Run the following queries to create schema, populate some data and then delete some data for
november:

{noformat}
drop keyspace if exists financial;

create keyspace if not exists financial with replication = {'class': 'SimpleStrategy', 'replication_factor'
: 1 };

create table if not exists financial.symbol_history (
  symbol text,
  name text static,
  year int,
  month int,
  day int,
  volume bigint,
  close double,
  open double,
  low double,
  high double,
  primary key((symbol, year), month, day)
) with CLUSTERING ORDER BY (month desc, day desc);

insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 1, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 2, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 3, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 4, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 5, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 6, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 7, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 8, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 9, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 10, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 11, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP',
'MegaCorp', 2004, 12, 1, 100);

delete from financial.symbol_history where symbol='CORP' and year = 2004 and month=11;
{noformat}

Flush and run sstable2json on the sole Data.db file:

{noformat}
ccm node1 flush
sstable2json /path/to/file.db
{noformat}

The output should look like the following:

{code:json}
[
{"key": "CORP:2004",
 "cells": [["::name","MegaCorp",1449457517033030],
           ["12:1:","",1449457517033030],
           ["12:1:volume","100",1449457517033030],
           ["11:_","11:!",1449457564983269,"t",1449457564],
           ["10:1:","",1449457516313738],
           ["10:1:volume","100",1449457516313738],
           ["9:1:","",1449457516310205],
           ["9:1:volume","100",1449457516310205],
           ["8:1:","",1449457516235664],
           ["8:1:volume","100",1449457516235664],
           ["7:1:","",1449457516233535],
           ["7:1:volume","100",1449457516233535],
           ["6:1:","",1449457516231458],
           ["6:1:volume","100",1449457516231458],
           ["5:1:","",1449457516228307],
           ["5:1:volume","100",1449457516228307],
           ["4:1:","",1449457516225415],
           ["4:1:volume","100",1449457516225415],
           ["3:1:","",1449457516222811],
           ["3:1:volume","100",1449457516222811],
           ["2:1:","",1449457516220301],
           ["2:1:volume","100",1449457516220301],
           ["1:1:","",1449457516210758],
           ["1:1:volume","100",1449457516210758]]}
]
{code:json}

Prepare for upgrade

{noformat}
ccm node1 nodetool snapshot financial
ccm node1 nodetool drain
ccm node1 stop
{noformat}

Upgrade to cassandra-3.0 and start the node

{noformat}
ccm node1 setdir -v git:cassandra-3.0
ccm node1 start
{noformat}

Run command in cqlsh and observe only 1 row is returned!  It appears that all data following
november is gone.

{noformat}
cqlsh> select * from financial.symbol_history;

 symbol | year | month | day | name     | close | high | low  | open | volume
--------+------+-------+-----+----------+-------+------+------+------+--------
   CORP | 2004 |    12 |   1 | MegaCorp |  null | null | null | null |    100
{noformat}

Upgrade sstables and query again and you'll observe the same problem.

{noformat}
ccm node1 nodetool upgradesstables financial
{noformat}

I modified the 2.2 version of sstable2json so that it works with 3.0 (couldn't help myself
:)), and observed 2 RangeTombstoneBoundMarker occurrences for 1 delete and the rest of the
data missing.

{code:json}
[
{
 "key": "CORP:2004",
 "static": {
  "cells": {
    ["name","MegaCorp",1449457517033030]
  }
 },
 "rows": [
  {
   "clustering": {"month": "12", "day": "1"},
   "cells": {
     ["volume","100",1449457517033030]
   }
  },
  {
   "tombstone": ["11:*",1449457564983269,"t",1449457564]
  },
  {
   "tombstone": ["11:*",1449457564983269,"t",1449457564]
  }
 ]
}
]
{code:json}

I'm not sure why this is happening, but I should point out that I'm using static columns here
and that I'm using reverse order for my clustering, so maybe that makes a difference.  I'll
try without static columns / regular ordering to see if that makes a difference and update
the ticket.


> SSTable data loss when upgrading with row tombstone present
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-10822
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10822
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Andy Tolbert
>
> I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also cassandra-3.0 branch)
where subsequent rows were lost within a partition where there is a row tombstone present.
> Here's a scenario that reproduces the issue.
> Using ccm create a single node cluster at 2.1.11:
> {{ccm create -n 1 -v 2.1.11 -s financial}}
> Run the following queries to create schema, populate some data and then delete some data
for november:
> {noformat}
> drop keyspace if exists financial;
> create keyspace if not exists financial with replication = {'class': 'SimpleStrategy',
'replication_factor' : 1 };
> create table if not exists financial.symbol_history (
>   symbol text,
>   name text static,
>   year int,
>   month int,
>   day int,
>   volume bigint,
>   close double,
>   open double,
>   low double,
>   high double,
>   primary key((symbol, year), month, day)
> ) with CLUSTERING ORDER BY (month desc, day desc);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 1, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 2, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 3, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 4, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 5, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 6, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 7, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 8, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 9, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 10, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 11, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) values
('CORP', 'MegaCorp', 2004, 12, 1, 100);
> delete from financial.symbol_history where symbol='CORP' and year = 2004 and month=11;
> {noformat}
> Flush and run sstable2json on the sole Data.db file:
> {noformat}
> ccm node1 flush
> sstable2json /path/to/file.db
> {noformat}
> The output should look like the following:
> {code}
> [
> {"key": "CORP:2004",
>  "cells": [["::name","MegaCorp",1449457517033030],
>            ["12:1:","",1449457517033030],
>            ["12:1:volume","100",1449457517033030],
>            ["11:_","11:!",1449457564983269,"t",1449457564],
>            ["10:1:","",1449457516313738],
>            ["10:1:volume","100",1449457516313738],
>            ["9:1:","",1449457516310205],
>            ["9:1:volume","100",1449457516310205],
>            ["8:1:","",1449457516235664],
>            ["8:1:volume","100",1449457516235664],
>            ["7:1:","",1449457516233535],
>            ["7:1:volume","100",1449457516233535],
>            ["6:1:","",1449457516231458],
>            ["6:1:volume","100",1449457516231458],
>            ["5:1:","",1449457516228307],
>            ["5:1:volume","100",1449457516228307],
>            ["4:1:","",1449457516225415],
>            ["4:1:volume","100",1449457516225415],
>            ["3:1:","",1449457516222811],
>            ["3:1:volume","100",1449457516222811],
>            ["2:1:","",1449457516220301],
>            ["2:1:volume","100",1449457516220301],
>            ["1:1:","",1449457516210758],
>            ["1:1:volume","100",1449457516210758]]}
> ]
> {code}
> Prepare for upgrade
> {noformat}
> ccm node1 nodetool snapshot financial
> ccm node1 nodetool drain
> ccm node1 stop
> {noformat}
> Upgrade to cassandra-3.0 and start the node
> {noformat}
> ccm node1 setdir -v git:cassandra-3.0
> ccm node1 start
> {noformat}
> Run command in cqlsh and observe only 1 row is returned!  It appears that all data following
november is gone.
> {noformat}
> cqlsh> select * from financial.symbol_history;
>  symbol | year | month | day | name     | close | high | low  | open | volume
> --------+------+-------+-----+----------+-------+------+------+------+--------
>    CORP | 2004 |    12 |   1 | MegaCorp |  null | null | null | null |    100
> {noformat}
> Upgrade sstables and query again and you'll observe the same problem.
> {noformat}
> ccm node1 nodetool upgradesstables financial
> {noformat}
> I modified the 2.2 version of sstable2json so that it works with 3.0 (couldn't help myself
:)), and observed 2 RangeTombstoneBoundMarker occurrences for 1 delete and the rest of the
data missing.
> {code}
> [
> {
>  "key": "CORP:2004",
>  "static": {
>   "cells": {
>     ["name","MegaCorp",1449457517033030]
>   }
>  },
>  "rows": [
>   {
>    "clustering": {"month": "12", "day": "1"},
>    "cells": {
>      ["volume","100",1449457517033030]
>    }
>   },
>   {
>    "tombstone": ["11:*",1449457564983269,"t",1449457564]
>   },
>   {
>    "tombstone": ["11:*",1449457564983269,"t",1449457564]
>   }
>  ]
> }
> ]
> {code}
> I'm not sure why this is happening, but I should point out that I'm using static columns
here and that I'm using reverse order for my clustering, so maybe that makes a difference.
 I'll try without static columns / regular ordering to see if that makes a difference and
update the ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message