So vmquest85 was restarted, but gen-app02 hasn't told it that there
are 2 other nodes that are down?
Which one is the seed node?
On Mon, Nov 23, 2009 at 6:38 PM, B. Todd Burruss <bburruss@real.com> wrote:
> i'm observing the following on a cluster that started with 4 nodes. i have
> been killing and restarting the various nodes as i test cassandra and now
> i'm seeing a lot of NotFoundException exceptions in the client because what
> i believe is ring state out of sync between the two nodes that are still up
> and available. The first ring state shown below reflects the current state
> of the cluster. Also I have seen similar issues when one of the nodes
> thinks another node is still available when in fact it has been killed. it
> seems to be related to bringing up, killing nodes too fast and not letting
> them figure out when a node is "dead". in this case i see TimedOutException
> related to NIO SocketChannel class.
>
> thx!
>
> [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080
> ring
> Address Status Load
> Range Ring
>
> 144038903974614862325597275257769797985
> 172.27.128.186Down 22.17 MB
> 31124469348629903091013930339840898757 |<--|
> 172.27.128.23 Down 22.17 MB
> 64378740291415296162944450043143967518 | |
> 172.27.128.22 Up 22.17 MB
> 121134220722269938669001112695509564769 | |
> 172.27.128.185Up 14.69 MB
> 144038903974614862325597275257769797985 |-->|
>
> [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080
> ring
> Address Status Load
> Range Ring
>
> 144038903974614862325597275257769797985
> 172.27.128.22 Up 22.17 MB
> 121134220722269938669001112695509564769 |<--|
> 172.27.128.185Up 14.69 MB
> 144038903974614862325597275257769797985 |-->|
> [cassandra.883477]$
>
>
>
|