cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.B. Langston (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair
Date Fri, 17 Oct 2014 21:43:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175588#comment-14175588
] 

J.B. Langston commented on CASSANDRA-8084:
------------------------------------------

I don't think sstableloader is working right. Here is the output for sstableloader itself:

{code}
automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd`
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to [/54.183.192.248,
/54.215.139.161, /54.165.222.3, /54.172.118.222]
Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540
progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] [/54.165.222.3 8/8 (100%)]
[/54.215.139.161 8/8 (100%)] [total: 100% - 2147483647MB/s (avg: 30MB/s)
{code}

Here is netstats on the node where it is running:

{code}
Responses                       n/a         0            812
automaton@ip-172-31-7-50:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
    /172.31.7.50 (using /54.183.192.248)
        Receiving 8 files, 1059673728 bytes total
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db
56468194/164372226 bytes(34%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db
278000000/278000000 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db
50674396/50674396 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db
68597334/68597334 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db
139068110/139068110 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db
12682638/12682638 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db
278000000/278000000 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db
68279024/68279024 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              0
Responses                       n/a         0            970
{code}

Here's netstats on the other node in the same DC:

{code}
automaton@ip-172-31-40-169:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
    /172.31.7.50 (using /54.183.192.248)
        Receiving 8 files, 1059673728 bytes total
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db
68279024/68279024 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db
278000000/278000000 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db
43078602/50674396 bytes(85%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db
278000000/278000000 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db
12682638/12682638 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db
139068110/139068110 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db
164372226/164372226 bytes(100%) received from /172.31.7.50
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db
68597334/68597334 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         249589
Responses                       n/a         0        1390344
{code}

The IP addresses seem backwards in netstats output.

Here is the output of netstat -anp | grep 7000 on the node where sstableloader is running:

{code}
tcp        0      0 172.31.7.50:7000        0.0.0.0:*               LISTEN      21544/java
tcp        0      0 172.31.7.50:7000        172.31.5.143:44869      ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:56991       172.31.5.143:7000       ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:7000        54.165.222.3:50968      ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:50599       54.165.222.3:7000       ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:50624       54.165.222.3:7000       ESTABLISHED 22226/java
tcp        0 1132336 172.31.7.50:50626       54.165.222.3:7000       ESTABLISHED 22226/java
tcp        0      0 172.31.7.50:7000        54.172.118.222:58561    ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:37769       54.172.118.222:7000     ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:37796       54.172.118.222:7000     ESTABLISHED 22226/java
tcp        0 1149712 172.31.7.50:37798       54.172.118.222:7000     ESTABLISHED 22226/java
tcp        0      0 172.31.7.50:7000        54.183.192.248:47451    ESTABLISHED 21544/java
tcp    43688      0 172.31.7.50:7000        54.183.192.248:47453    ESTABLISHED 21544/java
tcp        0      0 172.31.7.50:47451       54.183.192.248:7000     ESTABLISHED 22226/java
tcp        0  98464 172.31.7.50:47453       54.183.192.248:7000     ESTABLISHED 22226/java
tcp        0      0 172.31.7.50:41240       54.215.139.161:7000     ESTABLISHED 22226/java
tcp        0  81088 172.31.7.50:41242       54.215.139.161:7000     ESTABLISHED 22226/java
{code}

It's establishing a connection to itself (54.183.192.248) and to the other node in the local
DC (54.215.139.161) with the broadcast address instead of the listen address.

> GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt
use the PRIVATE IPS for Intra-DC communications - When running nodetool repair
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8084
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Config
>         Environment: Tested this in GCE and AWS clusters. Created multi region and multi
dc cluster once in GCE and once in AWS and ran into the same problem. 
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"
> NAME="Ubuntu"
> VERSION="12.04.3 LTS, Precise Pangolin"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu precise (12.04.3 LTS)"
> VERSION_ID="12.04"
> Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also latest DSE
version which is 4.5 and which corresponds to 2.0.8.39.
>            Reporter: Jana
>            Assignee: Yuki Morishita
>              Labels: features
>             Fix For: 2.0.12
>
>         Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0-v4.txt, 8084-2.0.txt
>
>
> Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) used the
PRIVATE IPS for communication between INTRA-DC nodes in my multi-region multi-dc cluster in
cloud(on both AWS and GCE) when I ran "nodetool repair -local". It works fine during regular
reads.
>  Here are the various cluster flavors I tried and failed- 
> AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties
file. 
> AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties
file. 
> GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties
file. 
> GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties
file. 
> I am expecting with the above setup all of my nodes in a given DC all communicate via
private ips since the cloud providers dont charge us for using the private ips and they charge
for using public ips.
> But they can use PUBLIC IPs for INTER-DC communications which is working as expected.

> Here is a snippet from my log files when I ran the "nodetool repair -local" - 
> Node responding to 'node running repair' 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8]
Sending completed merkle tree to /54.172.118.222 for system_traces/sessions
>  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) [repair
#1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for
system_traces/events
> Node running repair - 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 166) [repair
#1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for events from /54.172.118.222
> Note: The IPs its communicating is all PUBLIC Ips and it should have used the PRIVATE
IPs starting with 172.x.x.x
> YAML file values : 
> The listen address is set to: PRIVATE IP
> The broadcast address is set to: PUBLIC IP
> The SEEDs address is set to: PUBLIC IPs from both DCs
> The SNITCHES tried: GPFS and EC2MultiRegionSnitch
> RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message