cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arya Goudarzi (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
Date Wed, 24 Apr 2013 23:39:13 GMT


Arya Goudarzi commented on CASSANDRA-5432:

So, I rolled back CASSANDRA-5171. Pushed it to my test cluster. The gossip issue where nodes
after restart didn't see each other got fixed. The repair still tried to connect to the machine
running repair (self) with its public IP for requesting MerkleTree where it gets stuck, so
it has the same issue. Some behavior changed though, and the OutBoundTCPConnection didn't
report connecting to other 2 replicas for requesting MerkleTree, so I only saw the message
when trying to connect. Here is the snippet: 

 INFO [Thread-458] 2013-04-24 23:21:16,543 (line 2407) Starting repair
command #1, repairing 1 ranges for keyspace app_production
DEBUG [Thread-458] 2013-04-24 23:21:16,580 (line 2547) computing ranges
for 1808575600, 7089215977519551322153637656637080005, 14178431955039102644307275311465584410,
7307932921825930779602030, 49624511842636859255075463585608106435, 56713727820156410577229101240436610840,
85070591730234615865843651859750628460, 92159807707754167187997289514579132865, 9924902368527
3718510150927169407637270, 127605887595351923798765477788721654890, 134695103572871475120919115443550159295,
 INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,587 (line 651) [repair
#a9a87e40-ad35-11e2-945a-050d956ff11b] new session: will sync /, /,
33.163 on range (99249023685273718510150927169407637270,127605887595351923798765477788721654890]
for cardspring_production.[App]
 INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,598 (line 857) [repair
#a9a87e40-ad35-11e2-945a-050d956ff11b] requesting merkle trees for App (to [/XX.YYY.107.137,
/XX.YYY.133.163, /XXX.YY.98.11])
DEBUG [WRITE-/] 2013-04-24 23:21:16,601 (line 260)
attempting to connect to /XXX.YY.98.11
 INFO [AntiEntropyStage:1] 2013-04-24 23:21:19,111 (line 213) [repair
#a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree for App from /XX.YYY.133.163
DEBUG [ScheduledTasks:1] 2013-04-24 23:21:19,409 (line 121) GC for ParNew:
54 ms for 1 collections, 669806384 used; max is 4211081216
 INFO [AntiEntropyStage:1] 2013-04-24 23:21:20,408 (line 213) [repair
#a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree for App from /XX.YYY.107.137

See the debug line with OutboundTcpConnection. It is trying to connect to public IP of self
(XXX.YY.98.11), which is still an issue. What I was expecting to see before this line was
two other consecutive lines like before where it showed OutboundTcpConnection trying to connect
to other nodes as well. Despite them returning the MerkleTrees, those log lines did not show.
So, connection was made successfully to the other nodes somehow. 
> Repair Freeze/Gossip Invisibility Issues 1.2.4
> ----------------------------------------------
>                 Key: CASSANDRA-5432
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.4
>         Environment: Ubuntu 10.04.1 LTS
> C* 1.2.3
> Sun Java 6 u43
> JNA Enabled
> Not using VNodes
>            Reporter: Arya Goudarzi
>            Assignee: Vijay
>            Priority: Critical
> Read comment 6. This description summarizes the repair issue only, but I believe there
is a bigger problem going on with networking as described on that comment. 
> Since I have upgraded our sandbox cluster, I am unable to run repair on any node and
I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following
> - nodetool scrub
> - offline scrub
> - running repair on each CF separately. Didn't matter. All got stuck the same way.
> The repair command just gets stuck and the machine is idling. Only the following logs
are printed for repair job:
>  INFO [Thread-42214] 2013-04-05 23:30:27,785 (line 2379) Starting
repair command #4, repairing 1 ranges for keyspace cardspring_production
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 (line 652)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43,
/X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma
separated list of CFs]
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 (line 858)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries
(to [/X.X.X.43, /X.X.X.56, /X.X.X.190])
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 (line 214)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 (line 214)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from
> Please advise. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message