cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.B. Langston (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair
Date Fri, 10 Oct 2014 14:25:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166908#comment-14166908
] 

J.B. Langston commented on CASSANDRA-8084:
------------------------------------------

I tested and it appears to work. Here is the cluster I am testing with:

{code}
Datacenter: DC1_EAST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  54.165.222.3    711.26 MB  1       25.0%  dd449706-2059-4b65-ae98-0012d2cf8f67  rack1
UN  54.172.118.222  561.14 MB  1       25.0%  18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef  rack1
Datacenter: DC1_WEST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  54.183.192.248  721.2 MB   1       25.0%  c4dd37f1-d937-4876-8669-f0b01a3942db  rack1
UN  54.215.139.161  909.26 MB  1       25.0%  16499349-8cef-4a62-a99c-ab145cb70921  rack1

I wasn't sure initially because the logs and `nodetool netstats` still show the broadcast
address. You can see here that nodetool netstats, when run on 54.215.139.161, shows we are
streaming from 54.183.192.248 (the broadcast address of the other node in the same DC):

{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
    /54.183.192.248
        Receiving 9 files, 229856794 bytes total
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
58878176/58878176 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
97856/97856 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
3203116/3203116 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
12545306/12545306 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
1536228/1536228 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
12589230/12589230 bytes(100%) received from /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
2191474/2191474 bytes(100%) received from /54.183.192.248
        Sending 5 files, 109645980 bytes total
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db
14323672/14323672 bytes(100%) sent to /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db
20581730/20581730 bytes(100%) sent to /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db
3161694/3161694 bytes(100%) sent to /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db
69407704/69407704 bytes(100%) sent to /54.183.192.248
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        1495191
Responses                       n/a         0         714928
{code}

However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we are only connecting
to the local node on its listen address (172.31.7.50):

{code}
tcp        0      0 172.31.5.143:7000       0.0.0.0:*               LISTEN      17279/java
tcp        0      0 172.31.5.143:7000       172.31.5.143:34936      ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       172.31.5.143:34937      ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       172.31.5.143:34938      ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:34936      172.31.5.143:7000       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:34937      172.31.5.143:7000       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:34938      172.31.5.143:7000       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       172.31.7.50:52125       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       172.31.7.50:52126       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:57502      172.31.7.50:7000        ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:57560      172.31.7.50:7000        ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:57601      172.31.7.50:7000        ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:57602      172.31.7.50:7000        ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       54.165.222.3:33876      ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       54.165.222.3:33878      ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:44120      54.165.222.3:7000       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:44198      54.165.222.3:7000       ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       54.172.118.222:54515    ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:7000       54.172.118.222:54518    ESTABLISHED 17279/java
tcp        0      0 172.31.5.143:35960      54.172.118.222:7000     ESTABLISHED 17279/java
tcp        0    161 172.31.5.143:35880      54.172.118.222:7000     ESTABLISHED 17279/java
unix  2      [ ]         DGRAM                    7000     613/acpid
{code}

The only connections established to the broadcast addresses are for the nodes in the other
DC (54.165.222.3 and 54.172.118.222).

Is use of the broadcast address in netstats and the logs intentional? I can see some customers
getting confused by this. On the other hand, it matches what we show for nodetool ring and
status, so...

> GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt
use the PRIVATE IPS for Intra-DC communications - When running nodetool repair
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8084
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Config
>         Environment: Tested this in GCE and AWS clusters. Created multi region and multi
dc cluster once in GCE and once in AWS and ran into the same problem. 
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"
> NAME="Ubuntu"
> VERSION="12.04.3 LTS, Precise Pangolin"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu precise (12.04.3 LTS)"
> VERSION_ID="12.04"
> Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also latest DSE
version which is 4.5 and which corresponds to 2.0.8.39.
>            Reporter: Jana
>            Assignee: Yuki Morishita
>              Labels: features
>             Fix For: 2.0.11
>
>         Attachments: 8084-2.0.txt
>
>
> Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) used the
PRIVATE IPS for communication between INTRA-DC nodes in my multi-region multi-dc cluster in
cloud(on both AWS and GCE) when I ran "nodetool repair -local". It works fine during regular
reads.
>  Here are the various cluster flavors I tried and failed- 
> AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties
file. 
> AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties
file. 
> GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties
file. 
> GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties
file. 
> I am expecting with the above setup all of my nodes in a given DC all communicate via
private ips since the cloud providers dont charge us for using the private ips and they charge
for using public ips.
> But they can use PUBLIC IPs for INTER-DC communications which is working as expected.

> Here is a snippet from my log files when I ran the "nodetool repair -local" - 
> Node responding to 'node running repair' 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8]
Sending completed merkle tree to /54.172.118.222 for system_traces/sessions
>  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) [repair
#1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for
system_traces/events
> Node running repair - 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 166) [repair
#1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for events from /54.172.118.222
> Note: The IPs its communicating is all PUBLIC Ips and it should have used the PRIVATE
IPs starting with 172.x.x.x
> YAML file values : 
> The listen address is set to: PRIVATE IP
> The broadcast address is set to: PUBLIC IP
> The SEEDs address is set to: PUBLIC IPs from both DCs
> The SNITCHES tried: GPFS and EC2MultiRegionSnitch
> RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message