incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Lee" <mail.list.steel.men...@gmail.com>
Subject RE: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and sometimes kills traffic to a node if you restart it.
Date Mon, 28 Dec 2009 05:07:43 GMT
Confirm this issue by following tests
suppose a cluster contained 8 nodes, which contained about 10000 rows(key range from 1 to
10000):
Address       Status     Load          Range                                      Ring
                                       170141183460469231731687303715884105728    
10.237.4.85   Up         757.13 MB     21267647932558653966460912964485513216     |<--|
10.237.1.135  Up         761.54 MB     42535295865117307932921825928971026432     |   ^
10.237.1.137  Up         748.02 MB     63802943797675961899382738893456539648     v   |
10.237.1.139  Up         732.36 MB     85070591730234615865843651857942052864     |   ^
10.237.1.140  Up         725.6 MB      106338239662793269832304564822427566080    v   |
10.237.1.141  Up         726.59 MB     127605887595351923798765477786913079296    |   ^
10.237.1.143  Up         728.16 MB     148873535527910577765226390751398592512    v   |
10.237.1.144  Up         745.69 MB     170141183460469231731687303715884105728    |-->|

(1)	Read keys range [1-10000], all keys read out ok ( client send read request directly to
10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 )
(2)	Turn-off 10.237.1.135 while remain pressure, some read request will time out,
after all nodes know 10.237.1.135 has down (about 10 s later), all read request become ok
again, that’s fine
(3)	After turn-on 10.237.1.135(and cassandra service also), some read request will time out
again, and will remain FOREVER even all nodes know 10.237.1.135 has up, 
That’s a PROBLEM!
(4)	Reboot 10.237.1.135, problem remains.
(5)	If stop pressure and reboot whole cluster then perform step 1, all things are fine, again…..

All read request use Quorum policy, version of Cassandra is apache-cassandra-incubating-0.5.0-beta2,
and I’ve tested apache-cassandra-incubating-0.5.0-RC1, problem remains.

After read system.log, I found after 10.237.1.135 down and up again, other nodes will not
establish tcp connection to it(on tcp port 7000 ) forever! 
And read request sent to 10.237.1.135(into Pending-Writes because socket channel is closed)
will not sent to net forever(from observing tcpdump).

It’s seems when 10.237.1.135 going down in step2, some socket channel was reset ,
after 10.237.1.135 come back, these socket channel remain closed, forever
---------END----------


-----Original Message-----
From: Jonathan Ellis (JIRA) [mailto:jira@apache.org] 
Sent: Thursday, December 24, 2009 10:47 AM
To: cassandra-commits@incubator.apache.org
Subject: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and sometimes kills
traffic to a node if you restart it.


     [ https://issues.apache.org/jira/browse/CASSANDRA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-651:
-------------------------------------

    Fix Version/s: 0.5
         Assignee: Jaakko Laine

> cassandra 0.5 version throttles and sometimes kills traffic to a node if you restart
it.
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-651
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-651
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: latest in 0.5 branch
>            Reporter: Ramzi Rabah
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> From the cassandra user message board: 
> "I just recently upgraded to latest in 0.5 branch, and I am running
> into a serious issue. I have a cluster with 4 nodes, rackunaware
> strategy, and using my own tokens distributed evenly over the hash
> space. I am writing/reading equally to them at an equal rate of about
> 230 reads/writes per second(and cfstats shows that). The first 3 nodes
> are seeds, the last one isn't. When I start all the nodes together at
> the same time, they all receive equal amounts of reads/writes (about
> 230).
> When I bring node 4 down and bring it back up again, node 4's load
> fluctuates between the 230 it used to get to sometimes no traffic at
> all. The other 3 still have the same amount of traffic. And no errors
> what so ever seen in logs. " 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message