cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arya Goudarzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5432) AntiEntropy Repair Freezing on 1.2.3
Date Sat, 06 Apr 2013 09:15:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624375#comment-13624375
] 

Arya Goudarzi commented on CASSANDRA-5432:
------------------------------------------

OK, I found the problem, but something is changed in this release regarding the networking
that is not clear to me. I use EC2. I had to open all TCP ports to the world for the repairs
to work. They didn't even work when I allowed all TCP within out C*'s security group. This
is not acceptable. What was changed in 1.2.3 in terms of repair routing? Shouldn't it just
use the storage port?

We use Ec2MultiRegionSnitch, so it returns DNS that resolved to local ips for in-region communication
and public ips for cross-region communication. I have a C* 1.1.10 cluster in production and
it is working fine without having to open the security group wide open. 

Please advice.
                
> AntiEntropy Repair Freezing on 1.2.3
> ------------------------------------
>
>                 Key: CASSANDRA-5432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.3
>         Environment: Ubuntu 10.04.1 LTS
> C* 1.2.3
> Sun Java 6 u43
> JNA Enabled
> Not using VNodes
>            Reporter: Arya Goudarzi
>            Priority: Critical
>
> Since I have upgraded our sandbox cluster, I am unable to run repair on any node and
I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following
suggestions:
> - nodetool scrub
> - offline scrub
> - running repair on each CF separately. Didn't matter. All got stuck the same way.
> The repair command just gets stuck and the machine is idling. Only the following logs
are printed for repair job:
>  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting
repair command #4, repairing 1 ranges for keyspace cardspring_production
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43,
/X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma
separated list of CFs]
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries
(to [/X.X.X.43, /X.X.X.56, /X.X.X.190])
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from
/X.X.X.43
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214)
[repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from
/X.X.X.56
> Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message