cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: bug when node down-up??
Date Mon, 28 Dec 2009 03:06:45 GMT
you can click "follow" on the jira issue to be notified of changes in its status

On Sun, Dec 27, 2009 at 8:15 PM,  <mail.list.steel.mental@gmail.com> wrote:
> Yes, it's seems IS the same problem, does it's has no any fix yet?
>
> ---------END----------
>
> -----Original Message-----
> From: Ramzi Rabah [mailto:rrabah@playdom.com]
> Sent: Monday, December 28, 2009 12:43 AM
> To: cassandra-user@incubator.apache.org
> Subject: Re: bug when node down-up??
>
> I believe this is the same problem as
> https://issues.apache.org/jira/browse/CASSANDRA-651
>
>
>
> On Sun, Dec 27, 2009 at 7:38 AM,  <mail.list.steel.mental@gmail.com> wrote:
>> HI,guys:
>>
>>
>>
>> I probably found a bug, it’s seemed on-line cluster can’t resistant
>> rebooting of single node, although it suppose to be.
>>
>>
>>
>> suppose a cluster contained 8 nodes, which contained about 10000 rows(key
>> range from 1 to 10000):
>>
>> Address       Status     Load
>> Range                                      Ring
>>
>>
>> 170141183460469231731687303715884105728
>>
>> 10.237.4.85   Up         757.13 MB
>> 21267647932558653966460912964485513216     |<--|
>>
>> 10.237.1.135  Up         761.54 MB
>> 42535295865117307932921825928971026432     |   ^
>>
>> 10.237.1.137  Up         748.02 MB
>> 63802943797675961899382738893456539648     v   |
>>
>> 10.237.1.139  Up         732.36 MB
>> 85070591730234615865843651857942052864     |   ^
>>
>> 10.237.1.140  Up         725.6 MB
>> 106338239662793269832304564822427566080    v   |
>>
>> 10.237.1.141  Up         726.59 MB
>> 127605887595351923798765477786913079296    |   ^
>>
>> 10.237.1.143  Up         728.16 MB
>> 148873535527910577765226390751398592512    v   |
>>
>> 10.237.1.144  Up         745.69 MB
>> 170141183460469231731687303715884105728    |-->|
>>
>>
>>
>> (1)     Read keys range [1-10000], all keys read out ok ( client send read
>> request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 )
>>
>> (2)     Turn-off 10.237.1.135 while remain pressure, some read request will
>> time out,
>>
>> after all nodes know 10.237.1.135 has down (about 10 s later), all read
>> request become ok again, that’s fine
>>
>> (3)     After turn-on 10.237.1.135(and cassandra service, certainly), some
>> read request will time out again, and will remain FOREVER even all nodes
>> know 10.237.1.135 has up,
>>
>> That’s a PROBLEM!
>>
>> (4)     Reboot 10.237.1.135, problem remains.
>>
>> (5)     If stop pressure and reboot whole cluster then perform step 1, all
>> things are fine, again…..
>>
>>
>>
>> All read request use Quorum policy, version of Cassandra is
>> apache-cassandra-incubating-0.5.0-beta2, and I’ve tested
>> apache-cassandra-incubating-0.5.0-RC1, problem remains.
>>
>>
>>
>> After read system.log, I found after 10.237.1.135 down and up again, other
>> nodes will not establish tcp connection to it(on tcp port 7000 ) forever!
>>
>> And read request sent to 10.237.1.135(into Pending-Writes because socket
>> channel is closed) will not sent to net forever(from observing tcpdump).
>>
>>
>>
>> It’s seems when 10.237.1.135 going down in step2, some socket channel was
>> reset ,
>>
>> after 10.237.1.135 come back, these socket channel remain closed, forever….,
>> I don’t know….
>>
>>
>>
>> Sorry for my poor English…, hope I’ve stated my problem clear.
>>
>>
>>
>> ---------END----------
>>
>>
>
>

Mime
View raw message