cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gareth Collins <gareth.o.coll...@gmail.com>
Subject Weird Bootstrapping Issue
Date Tue, 02 May 2017 04:11:00 GMT
Hi,

We are running Cassandra 2.1.14 on an IBM AIX cluster using IBM Java 7
(1.7.1.64). I am having problems adding new nodes to the cluster. I am
seeing the following exception. It appears like the new node is
getting stuck trying to send the magic number on the first streaming
socket...whilst the receiving node never receives it and times out
after 10 seconds.

New Node:

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,196
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /1.2.3.4

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:20,197
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /5.6.7.8

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,209
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /1.2.3.4

INFO  [STREAM-IN-/1.2.3.4] 2017-04-28 17:39:20,276
StreamResultFuture.java:166 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92 ID#0] Prepare completed.
Receiving 2 files(43103 bytes), sending 0 files(0 bytes)

INFO  [StreamReceiveTask:2] 2017-04-28 17:39:20,410
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /1.2.3.4 is
complete

ERROR [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,207
StreamSession.java:505 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Streaming error occurred

java.nio.channels.AsynchronousCloseException: null

        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:224)
~[na:1.7.0]

        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:538)
~[na:1.7.0]

        at org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.sendInitMessage(ConnectionHandler.java:191)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:81)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208)
[apache-cassandra-2.1.14.jar:2.1.14]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
[na:1.7.0]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
[na:1.7.0]

        at java.lang.Thread.run(Thread.java:809) [na:1.7.0]

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,208
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /5.6.7.8 is
complete

WARN  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,211
StreamResultFuture.java:207 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Stream failed

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,212
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /5.6.7.8

ERROR [main] 2017-04-28 17:39:30,213 CassandraDaemon.java:581 -
Exception encountered during startup

java.lang.RuntimeException: Error during boostrap: Stream failed

        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
~[apache-cassandra-2.1.14.jar:2.1.14]


Existing node:

DEBUG [ACCEPT-/5.6.7.8] 2017-04-28 17:39:29,914
MessagingService.java:1014 - Error reading the socket
Socket[addr=/9.0.1.2,port=55848,localport=7000]

java.net.SocketTimeoutException: null

        at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:242)
~[na:1.7.0]

        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:116)
~[na:1.7.0]

        at java.io.DataInputStream.readFully(DataInputStream.java:207)
~[na:1.7.0]

        at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:988)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,989
IncomingTcpConnection.java:92 - eof reading from socket; closing

java.io.EOFException: null

        at java.io.DataInputStream.readFully(DataInputStream.java:209)
~[na:1.7.0]

        at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,990
IncomingTcpConnection.java:115 - Closing socket
Socket[addr=/9.0.1.2,port=55840,localport=7000] - isclosed: false

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,991
IncomingTcpConnection.java:92 - eof reading from socket; closing

java.io.EOFException: null

        at java.io.DataInputStream.readFully(DataInputStream.java:209)
~[na:1.7.0]

        at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171)
~[apache-cassandra-2.1.14.jar:2.1.14]

        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]

Everything works fine bringing up the new node until it gets up to
streaming. Did a wireshark and nothing is sent on the streaming
socket. Put the existing cluster nodes in trace and didn't see
anything very exciting (apart from the error - need to add trace logs
to the client). Other java processes are networking on the same system
without problem. Resource limit values appear to be set correctly. We
played around with zero data in the cluster and bootstrapped with full
data...with zero data we were able to create a cluster of three nodes
(though if we started with the "wrong node" we couldn't create a
cluster of size greater than one) and with full data (9GB) we were
able to create a cluster with only two nodes. The time on the nodes
may be off by up to a second - would that be big enough to cause any
trouble when bootstrapping?

Anyone seen something like this before? I haven't found anything so
far in bugs, google searches, mailing lists that match this behaviour
(though I could have missed something). Of course this could be an
AIX/IBM Java specific issue (as I know the recommendation is to use
Oracle JVM and AIX is not a Cassandra standard configuration)...

Any suggestions would be appreciated.

thanks in advance,
Gareth

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message