cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Ring out of sync, cassandra_UnavailableException being thrown
Date Fri, 21 May 2010 00:04:39 GMT
Were you bootstrapping or otherwise moving nodes around?

I don't think anyone's tracked this bug down farther than "if you
restart the entire cluster, it goes away."

On Wed, May 19, 2010 at 10:05 PM, Keith Thornhill <keith@raptr.com> wrote:
> in a 5 node cluster, i noticed in our client error log that one of the
> nodes was consistently throwing cassandra_UnavailableException during
> a read operation.
>
> looking into jmx, it was obvious that one of the node's view of the
> ring was out of sync.
>
> $ nodetool -host 192.168.20.150 ring
> Address       Status     Load          Range
>           Ring
>
> 139508497374977076191526400448759597506
> 192.168.20.156Up         5.73 GB
> 733665530305941485083898696792520436       |<--|
> 192.168.20.158Up         3.41 GB
> 9629533262984150011756238989685472219      |   ^
> 192.168.20.154Up         2.44 GB
> 31048334058970902242412812423471654868     v   |
> 192.168.20.150Up         4.89 GB
> 105769574715070648260922426249777160699    |   ^
> 192.168.20.152Up         5.24 GB
> 139508497374977076191526400448759597506    |-->|
>
> $ nodetool -host 192.168.20.158 ring
> Address       Status     Load          Range
>           Ring
> 192.168.20.158Up         3.41 GB
> 9629533262984150011756238989685472219      |<--|
>
> looking at the CF stats on that node, it is obvious that reads and
> writes are happening, but i have to assume that those are coming from
> proxy connections via the other nodes.
>
> when restarting that node, the error logs in the other cluster nodes
> show that they detect the server going away and then coming back into
> the ring.
>
> INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448
> OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
> INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475
> OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
> INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node
> /192.168.20.158 has restarted, now UP again
> INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538)
> Node /192.168.20.158 state jump to normal
>
> any ideas on how to kick that node and remind it of its buddies?
>
> thanks!
> -keith
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message