incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject bug when node down-up??
Date Sun, 27 Dec 2009 15:38:57 GMT


I probably found a bug, it’s seemed on-line cluster can’t resistant rebooting of single
node, although it suppose to be.


suppose a cluster contained 8 nodes, which contained about 10000 rows(key range from 1 to

Address       Status     Load          Range                                      Ring

                                       170141183460469231731687303715884105728   Up         757.13 MB     21267647932558653966460912964485513216     |<--|  Up         761.54 MB     42535295865117307932921825928971026432     |   ^  Up         748.02 MB     63802943797675961899382738893456539648     v   |  Up         732.36 MB     85070591730234615865843651857942052864     |   ^  Up         725.6 MB      106338239662793269832304564822427566080    v   |  Up         726.59 MB     127605887595351923798765477786913079296    |   ^  Up         728.16 MB     148873535527910577765226390751398592512    v   |  Up         745.69 MB     170141183460469231731687303715884105728    |-->|


(1)     Read keys range [1-10000], all keys read out ok ( client send read request directly
to,,, )

(2)     Turn-off while remain pressure, some read request will time out,

after all nodes know has down (about 10 s later), all read request become ok
again, that’s fine

(3)     After turn-on cassandra service, certainly), some read request will
time out again, and will remain FOREVER even all nodes know has up, 

That’s a PROBLEM!

(4)     Reboot, problem remains.

(5)     If stop pressure and reboot whole cluster then perform step 1, all things are fine,


All read request use Quorum policy, version of Cassandra is apache-cassandra-incubating-0.5.0-beta2,
and I’ve tested apache-cassandra-incubating-0.5.0-RC1, problem remains.


After read system.log, I found after down and up again, other nodes will not
establish tcp connection to it(on tcp port 7000 ) forever! 

And read request sent to Pending-Writes because socket channel is closed)
will not sent to net forever(from observing tcpdump).


It’s seems when going down in step2, some socket channel was reset ,

after come back, these socket channel remain closed, forever…., I don’t know….


Sorry for my poor English…, hope I’ve stated my problem clear.




View raw message