cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oliver Seiler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6349) IOException in MessagingService.run() causes orphaned storage server socket
Date Mon, 09 Dec 2013 16:42:08 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843308#comment-13843308
] 

Oliver Seiler commented on CASSANDRA-6349:
------------------------------------------

I suspect these changes introduced an infinite loop if the ServerSocket gets closed (not sure
how that is happening though). We've been seeing some major problems with Cassandra 2.0.3
when a new cluster is coming up for the first time, and it seems to be a result of this. With
logging set to debug, system.log is getting pummelled with these exception messages:

{noformat}
DEBUG [ACCEPT-localhost-grid/10.96.99.178] 2013-12-06 22:55:39,759 MessagingService.java (line
905) Error reading the socket null
java.net.SocketException: Socket closed
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at sun.security.ssl.SSLServerSocketImpl.accept(Unknown Source)
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:865)
{noformat}

It looks like once in this state, nothing will break it out; prior to this change the IOException
catch block was throwing another exception, now it just keeps looping, using the (seemingly
closed) ServerSocket. Restarting Cassandra seems to be the only way to resolve this. I'll
probably be recommending we drop back to 2.0.2 until this problem is fixed (or we can understand
why the ServerSocket is closed...)


> IOException in MessagingService.run() causes orphaned storage server socket
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6349
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6349
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: cassandra 2.0+
>            Reporter: Steven Halaka
>            Assignee: Mikhail Stepura
>             Fix For: 2.0.3
>
>         Attachments: CASSANDRA-2.0-6349.patch
>
>
> The refactoring of reading the message header in MessagingService.run() vs IncomingTcpConnection
seems to mishandle IOException as the loop is broken and MessagingService.SocketThread never
seems to get reinitialized.
> To reproduce: telnet to port 7000 and send random data. This then prevents any new or
restarting node in the cluster from handshaking with this defunct storage port.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message