cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Querna (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CASSANDRA-2201) Gossip synchronization issues
Date Sat, 19 Feb 2011 21:34:38 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996863#comment-12996863
] 

Paul Querna edited comment on CASSANDRA-2201 at 2/19/11 9:33 PM:
-----------------------------------------------------------------

After more investigation, we found that 3 of 24 nodes had inconsistent rings.

-Restarting them each seems to have fixed this issue.-

Restarting sometimes fixed the issue.

      was (Author: pquerna):
    After more investigation, we found that 3 of 24 nodes had inconsistent rings.

Restarting them each seems to have fixed this issue.
  
> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>         Attachments: CASSANDRA-2201-jstack.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as
missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 --
but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                
     Ring
>                                                163051967482949680409533666061055601315
   
> 172.21.2.222  Up         224.03 GB     4.17%   0                                    
     |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405
     |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810
    v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215
    |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620
    v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025
    |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430
    v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835
    |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240
    v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645
    |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050
    v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455
    |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860
    v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265
    |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670
    v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075
   |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480
   v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885
   |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290
   v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695
   |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100
   v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505
   |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910
   v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315
   |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                
     Ring
>                                                163051967482949680409533666061055601315
   
> 172.21.2.222  Up         224.03 GB     4.17%   0                                    
     |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810
    |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215
    v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620
    |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025
    v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430
    |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835
    v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050
    |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860
    v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265
    |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075
   v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480
   |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885
   v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290
   |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695
   v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100
   |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505
   v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910
   |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315
   |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about
that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message