cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bing Wu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
Date Sun, 13 Nov 2016 06:03:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15660915#comment-15660915
] 

Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 6:03 AM:
---------------------------------------------------------------

[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same
throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of
30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the
timestamp when the "initiator" (the host that was running repair) reported failure to search
the system.log on every node. Can confirm those failures all pointed back to the initiator,
e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream
#496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp in the log file
differ from those in the original bug report as this is from another round of test_) debug.log.2016-11-10_2319.gz


was (Author: bing1wu):
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same
throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of
30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the
timestamp when the "initiator" (the host that was running repair) reported failure to search
the system.log on every node. Can confirm those failures all pointed back to the initiator,
e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream
#496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator) debug.log.2016-11-10_2319.gz

> Streaming failed due to SSL Socket connection reset
> ---------------------------------------------------
>
>                 Key: CASSANDRA-12886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Bing Wu
>         Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of "javax.net.ssl.SSLException:
java.net.SocketException: Connection reset" in system.logs on some nodes in the cluster. Timestamps
correspond to streaming source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
>     internode_encryption: all
>     keystore: [path]
>     keystore_password: [password]
>     truststore: [path]
>     truststore_password: [password]
>     # More advanced defaults below:
>     # protocol: TLS
>     # algorithm: SunX509
>     # store_type: JKS
>     # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>     require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 StreamSession.java:529
- [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] Streaming error occurred on session with
peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException:
java.net.SocketException: Connection reset
>         at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) ~[na:1.8.0_102]
>         at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) ~[na:1.8.0_102]
>         at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) ~[na:1.8.0_102]
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_102]
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_102]
>         at org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
~[apache-cassandra-3.7.0.jar:3.7.0]
>         at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
[apache-cassandra-3.7.0.jar:3.7.0]
>         at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
[apache-cassandra-3.7.0.jar:3.7.0]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message