cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Barua <sbarua...@yahoo.com.INVALID>
Subject Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'
Date Mon, 26 Aug 2019 22:51:08 GMT
Michael,

Are you able to connect to any c* node via OpenSSL?

Openssl s_client -connect <ip address >:9042

Cqlsh <ip address> —ssl 

Subroto 

> On Aug 26, 2019, at 2:47 PM, Marc Selwan <marc.selwan@datastax.com> wrote:
> 
> which exact version of OpenJDK are you using? Is it possible you don't have JCE on those
nodes? (I believe more recent versions of Java 8 has this baked in so that might not be it)
> 
> 
> Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
> 
>   Quick links | DataStax | Training | Documentation | Downloads  
> 
> 
> 
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <mcarlise@salesforce.com.invalid>
wrote:
>> 
>> I originally opened this issue on stackoverflow (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).
 
>> 
>> However, I haven't gotten any responses in over a week.  I'm going to post it here
and maybe someone will have an idea on where I can look.
>> 
>> We currently run a multi region cassandra cluster in AWS. It runs in four regions,
12 nodes per region. It runs without node to node encryption (or client encryption either).
We are trying to enable inter datacenter node to node encryption. However, when we flip encryption
over we get an exception that nodes are unable to gossip with any peers.
>> 
>> It could possibly be that we didn't build our jks keystore/truststores correctly
(more on how we built these files below). But, we additionally do not see intra datacenter
communication working (which should be set to unencrypted communication). Additionally, cqlsh
cannot connect to the node either; even though we have (by default) client_auth_required set
to false.
>> 
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered
during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
[apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration
location: file:/etc/cassandra/cassandra.yaml
>> 
>> Something to note is that this error message occurs after a few minutes of the node
being up. (i.e. there is a delay between start up before this exception is thrown).
>> 
>> Information about our cassandra setup
>> 
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>> 
>> cassandra.yaml
>> 
>> endpoint_snitch: Ec2MultiRegionSnitch
>> 
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: <omitted>
>>   keystore_password: <omitted>
>>   truststore: <omitted>
>>   truststore_password: <omitted>
>> 
>> client_encryption_options:
>>   enabled: false
>> cassandra-rackdc.properties
>> 
>> prefer_local=true
>> No obvious errors with SSH output
>> 
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added to
cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer were omitted
on purpose).
>> 
>> found key for : cassy-us-west-2                                                 
                                                                                         
                                                           
>> adding as trusted cert:                                                         
                                                                                         
                                                           
>>   Subject: ...                                                                  
                                                                                   
>>   Issuer:  ...                                                                  
                                                                                   
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74              
                                                                                         
                                                           
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026  
>> 
>> ...
>> 
>> trigger seeding of SecureRandom
>> done seeding SecureRandom   
>> Looking at Java SE SSL/TLS connection debugging, this looks correct. But to note,
we see this series of messages (along with the RSA key signature output) repeated several
times in rapid fire. We never observe any messages about the trust store being added; however
that might be something that occurs only on client initiation (?)
>> 
>> Additionally, we do see cassandra report that the Encrypted Messaging service has
been started.
>> 
>> INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted
Messaging Service on SSL port 7001
>> Doesn't appear to be a cassandra.yaml configuration problem
>> 
>> We can bring the node back online by simply configuring internode_encryption: none.
This action seems to rule out a broadcast_address or rpc_address configuration problem.
>> 
>> How we built our keystore/truststores
>> 
>> We followed the basic template datastax docs for preparing SSL certificates. One
minor difference was that our private key and CSRs were generated using openssl. One per each
region (we plan to share key/signed certs across nodes in regions). This was created using
a command template as:
>> 
>> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key
-config cassy-<region>.conf -subj "..." -nodes -sha256
>> The generated CSR was then signed by an internal root CA. Because we generated our
files using openssl, we had to build our jks files by importing our certs into them.
>> 
>> Commands to generate truststore
>> 
>> We distribute this one file to all nodes.
>> 
>> keytool -importcert 
>>     -keystore generic-server-truststore.jks 
>>     -alias rootCa  
>>     -file rootCa.crt 
>>     -noprompt
>>     -keypass omitted 
>>     -storepass omitted 
>> Commands to generate keystore
>> 
>> This was done one per region; but essentially we created a keystore with keytool,
then deleted the key entry and then imported our key entry using keytool from a pkcs12 file.
>> 
>> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks
-storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..." 
>> 
>> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted
>> 
>> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key
-name cassy-${region} -out ${region}.p12 
>> 
>> keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks
-srckeystore ${region}.p12 -srcstoretype PKCS12 
>> 
>> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt
-keypass omitted -storepass omitted 
>> Looking back at this, I don't remember why we used keytool to generate a keypair/keystore,
then deleted and imported. I think it was because the keytool importkeystore command refused
to run if the keystore didn't already exist.
>> 
>> ca.crt and pem file
>> 
>> The ca.crt file contains the root certificate and the intermediate certificate that
was used to sign the CSR. The pem file contains the signed CSR returned to us, the intermediate
cert, and the root CA (in that order).
>> 
>> openssl verify ca.crt and pem
>> 
>> openssl verify -CAfile ca.crt us-west-2.pem
>> signed_certs/us-west-2.pem: OK
>> Command output after enabling encryption
>> 
>> nodetool status (output truncated)
>> 
>> Datacenter: us-east                                                             
                                  
>> ===================                                      
>> Status=Up/Down                                           
>> |/ State=Normal/Leaving/Joining/Moving                   
>> --  Address         Load       Tokens       Owns (effective)  Host ID           
                   Rack
>> ?N  52.44.11.221    ?          256          25.4%             null              
                   1c             
>> ...
>> ?N  52.204.232.195  ?          256          23.2%             null              
                   1d             
>> Datacenter: us-west-2                                                           
                                  
>> =====================
>> Status=Up/Down                                           
>> |/ State=Normal/Leaving/Joining/Moving                   
>> --  Address         Load       Tokens       Owns (effective)  Host ID           
                   Rack           
>> ?N  34.209.2.144    ?          256          26.5%             null              
                   2c             
>> UN  52.40.32.177    105.99 GiB  256          23.7%             null             
                    2c            
>> ?N  34.210.109.203  ?          256          24.7%             null              
                   2a   
>> ...                  
>> With the online node being the node with encryption set.
>> 
>> cqlsh to localhost
>> 
>> cassy-node6:~$ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried
connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
>> cqlsh to remote node Remote node is a node with encryption enabled
>> 
>> cassy-node6:~$ cqlsh 10.0.2.7
>> Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried
connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})

Mime
View raw message