cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cyril Scetbon <cyril.scet...@free.fr>
Subject Re: Crashed node takes more than an hour to join the ring
Date Mon, 27 Jan 2014 11:04:07 GMT
I Forgot to say that we use version 1.2.2 (we'll update soon but I didn't see any change about
that in CHANGES.txt)
-- 
Cyril SCETBON

On 27 Jan 2014, at 12:01, Cyril Scetbon <cyril.scetbon@free.fr> wrote:

> Hi,
> 
> When one node has crashed for system reasons, it takes more than an hour to come back
in the ring. During this time, no other node sees it :
> 
> Datacenter: b1
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns   Host ID                             
 Rack
> DN  XXXXXXXXXX     ?          256     3.8%   7b3d0ac4-bdf6-4e09-8a11-9794b1481c95  b05
> DN  XXXXXXXXXX     ?          256     3.1%   3a1172df-0260-4398-a008-05dc77e9f763  c03
> DN  XXXXXXXXXX     ?          256     3.7%   9e3cfd48-5697-4150-898e-b176d0eed4a0  b05
> DN  XXXXXXXXXX     ?          256     3.7%   347df11c-0d83-429c-a7a0-8d20c21a075a  c09
> DN  XXXXXXXXXX     ?          256     3.8%   d4083488-c614-4786-851b-e50a407d61a9  c03
> DN  XXXXXXXXXX     ?          256     3.7%   5a50d537-08fb-48cb-b8a0-829acb05b72e  b08
> DN  XXXXXXXXXX     ?          256     3.6%   a309c0da-aee8-4fed-aa9c-16ae103e42d3  c09
> DN  XXXXXXXXXX     ?          256     3.5%   41ff6e09-fb84-46f5-9efd-33f6ade49d7f  b08
> DN  XXXXXXXXXX     ?          256     3.2%   ad3ba9a2-5fe4-4208-b5ae-4f1a40942bb9  b08
> DN  XXXXXXXXXX     ?          256     3.4%   40140f99-e1b0-4fe0-93d2-cafdde05151f  c09
> DN  XXXXXXXXXX     ?          256     3.4%   f0c37b06-a335-49ab-819f-603945507ee9  b05
> DN  XXXXXXXXXX     ?          256     3.4%   ef1df7f6-5ae9-4ebf-bb14-e1373fc451ea  c03
> Datacenter: s1
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns   Host ID                             
 Rack
> DN  XXXXXXXXXX     ?          256     3.5%   bbfbc2bb-dbee-4221-804d-9cc0760cc440  k09
> UN  XXXXXXXXXX     113.13 GB  256     3.5%   f6e41cf7-fffa-4a24-bc2d-0325051afd8f  h05
> DN  XXXXXXXXXX     ?          256     3.8%   3f66cbb2-427b-4bb0-8521-9789ce4358fa  h05
> DN  XXXXXXXXXX     ?          256     3.5%   c4763e28-48cf-4581-b576-6c0b06924ec6  h05
> DN  XXXXXXXXXX     ?          256     3.2%   8edb1155-990a-4946-a251-bb4cb4c59552  b05
> DN  XXXXXXXXXX     ?          256     4.2%   695adecd-4d49-412b-94db-cf695e3b5298  h05
> DN  XXXXXXXXXX     ?          256     3.8%   60b9b784-25ec-4a5a-ac76-d1c19bb2be72  c05
> DN  XXXXXXXXXX     ?          256     3.8%   3cf22978-c8d9-474e-8f6d-dbbcb1d7784e  b05
> DN  XXXXXXXXXX     ?          256     3.5%   4cfb5924-ea62-465b-8e39-b0b77a809422  k09
> DN  XXXXXXXXXX     ?          256     3.3%   08d99c7c-fb6b-4731-8408-27afb6aa79e5  k09
> DN  XXXXXXXXXX     ?          256     3.1%   1fb09426-3191-46f5-ab54-c2b6e980fcfe  k09
> DN  XXXXXXXXXX     ?          256     3.5%   79f64055-2681-43d2-a8e3-375ca9d6b771  c05
> DN  XXXXXXXXXX     ?          256     3.7%   88a8c59e-4dc9-47b2-b7d7-bb422199fa76  b05
> DN  XXXXXXXXXX     ?          256     3.7%   1d6ef3e5-76bc-4cac-9151-bbfd5b5e7e0e  c05
> DN  XXXXXXXXXX     ?          256     3.4%   79cf98d7-3bfe-4a94-97bd-95837dbe7623  c05
> DN  XXXXXXXXXX     ?          256     4.1%   541cd94b-1f94-47a4-83d3-66ed3ffe222d  b05
> 
> there is nothing noticeable in the logs even if debug mode :
> 
>  INFO [main] 2014-01-27 10:00:21,706 TServerCustomFactory.java (line 47) Using synchronous/threadpool
thrift server on 0.0.0.0 : 9160
>  INFO [Thread-8] 2014-01-27 10:00:21,707 ThriftServer.java (line 110) Listening for thrift
clients...
>  WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,765 Password4LevelAuthenticator.java (line
205) PasswordAuthenticator skipped default user setup: some nodes were not ready
>  WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,794 Auth.java (line 207) Skipped default
superuser setup: some nodes were not ready
> 
> Top threads are RMI threads :
> 
> <Screen Shot 2014-01-27 at 11.23.29.png>
> 
> and more than one hour later we see :
> 
> DEBUG [Thread-3964] 2014-01-27 11:24:18,856 IncomingTcpConnection.java (line 75) Connection
version 6 from /XXXXXXXXXX
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line 112) Upgrading
incoming connection to be compressed
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line 120) Max
version for /XXXXXXXXXX is 6
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 MessagingService.java (line 805) Setting
version 6 for /XXXXXXXXXX
> DEBUG [Thread-3964] 2014-01-27 11:24:18,858 IncomingTcpConnection.java (line 129) set
version for /XXXXXXXXXX to 6
> DEBUG [Thread-3964] 2014-01-27 11:24:18,862 MessagingService.java (line 812) Reseting
version for /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line 75) Connection
version 6 from /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line 112) Upgrading
incoming connection to be compressed
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line 120) Max
version for /XXXXXXXXXX is 6
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 MessagingService.java (line 805) Setting
version 6 for /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line 129) set
version for /XXXXXXXXXX to 6
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,876 Gossiper.java (line 722) Clearing interval
times for /XXXXXXXXXX due to generation change
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) Clearing interval
times for /XXXXXXXXXX due to generation change
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) Clearing interval
times for /XXXXXXXXXX due to generation change
> 
> We meet this issue only when the system crashes
> 
> any idea of a possible origin or a known behaviour ?
>  -- 
> Cyril SCETBON
> 


Mime
View raw message