cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivo Ladage-van Doorn (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1992) Bootstrap breaks data stored (missing rows, extra rows, column values modified)
Date Mon, 17 Jan 2011 09:34:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982561#action_12982561
] 

Ivo Ladage-van Doorn commented on CASSANDRA-1992:
-------------------------------------------------

I have the exact same problem with an existing installation and was preparing to create an
issue for it, but found this issue just before creating it. I'll describe the issue I have,
maybe that provides some relevant information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an existing one-node
cluster. The existing node contains already some data when the second node is added to the
cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 and a 'r2d2'
on 172.16.2.206. I use Cassandra 0.7 and only change the following settings in the cassandra.yaml
and log4j-server.properties (I use the default values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the following keyspace,
column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in total but finishes
without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) Sleeping 90000 ms
to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node /172.16.2.203 is
now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 192) Started
hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) InetAddress /172.16.2.203
is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 248) Finished
hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: getting bootstrap
token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will be 110533280274756817580689726417060138498
to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: sleeping 30000
ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding thrift service
to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using TFramedTransport
with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely finished bootstrapping
the rows in the 'Role' and 'Gadget' Column Families are messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to the cluster.
The actual result differs each time I try, but always some rows have been moved to some other
CF. The problem seems the same as the one described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also changing the replication
factor from 1 to 2 most of the times 'resolves' the issue.

> Bootstrap breaks data stored (missing rows, extra rows, column values modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz
PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster nodes stay
the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 nodes, 2->3 nodes
RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, not sure
what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message