cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernardino Mota <bernardino.m...@knowledgeworks.pt>
Subject Re: Nodes fail to reconnect after several hours of network failure.
Date Thu, 21 Jan 2016 14:08:09 GMT
In the logs nothing strange but “nodetool gossipinfo” seems OK

 ./nodetool gossipinfo
/192.168.1.10
  generation:1453316804
  heartbeat:206518
  STATUS:18:NORMAL,-1003341236369672970
  LOAD:206420:4.3533596E7
  SCHEMA:14:6f97097b-45ce-3479-8b2f-af2fef4967e7
  DC:8:DC2
  RACK:10:rack1
  RELEASE_VERSION:4:2.2.4
  INTERNAL_IP:6:192.168.1.10
  RPC_ADDRESS:3:127.0.0.1
  SEVERITY:206517:0.0
  NET_VERSION:1:9
  HOST_ID:2:51650afd-84dd-4e25-a6f0-13627858d5dc
  RPC_READY:49:true
  TOKENS:17:<hidden>
/192.168.1.102
  generation:1453316986
  heartbeat:84622
  STATUS:28:NORMAL,-1085177681742913545
  LOAD:84535:1.2606418E7
  SCHEMA:14:6f97097b-45ce-3479-8b2f-af2fef4967e7
  DC:8:DC1
  RACK:10:rack1
  RELEASE_VERSION:4:2.2.4
  INTERNAL_IP:6:10.0.2.10
  RPC_ADDRESS:3:127.0.0.1
  SEVERITY:84624:0.0
  NET_VERSION:1:9
  HOST_ID:2:ff906882-8224-40ac-8cdb-98f5e725814d
  RPC_READY:98:true
  TOKENS:27:<hidden>
  
 


> On 21 Jan 2016, at 13:17, Adil <adil.chabaq@gmail.com> wrote:
> 
> Hi,
> do you see any message related to gossip info?
> 
> 
> 2016-01-21 14:09 GMT+01:00 Bernardino Mota <bernardino.mota@knowledgeworks.pt <mailto:bernardino.mota@knowledgeworks.pt>>:
> Using Cassandra 2.2.4 on Ubuntu.
> 
> We have a cluster with two nodes that during several hours failed to connect with each
other due to network problems. The database continued to be used in one of the nodes with
writes being stored in the Hints file as supposed.
> 
> But now that the network is OK again and each machine can communicate we see that each
node indicates the other is DOWN and does not replicates.
> 
> When the network came up we started to see in log files "Convicting /192.168.1.102 <http://192.168.1.102/>
with status NORMAL - alive false"
> 
> It seems each node evictions each other and later failing to reconnect.
> 
> Is there some configuration that we might be missing ? Any help would be much appreciated.
> 
> 
> 
> - NODE 192.168.1.10 - "nodetool status”
> 
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns    Host ID                          
    Rack
> DN  192.168.1.102  12.02 MB   256          ?       ff906882-8224-40ac-8cdb-98f5e725814d
 rack1
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns    Host ID                          
    Rack
> UN  192.168.1.10   41.87 MB   256          ?       51650afd-84dd-4e25-a6f0-13627858d5dc
 rack1
> 
> 
> 
> - NODE 192.168.1.102  - “nodetool status"
> 
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns    Host ID                          
    Rack
> UN  192.168.1.102  12.4 MB    256          ?       ff906882-8224-40ac-8cdb-98f5e725814d
 rack1
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns    Host ID                          
    Rack
> DN  192.168.1.10   26.31 MB   256          ?       51650afd-84dd-4e25-a6f0-13627858d5dc
 rack1
> 
> 
> 


Mime
View raw message