cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Bootstrap fails on 3.10
Date Fri, 25 Nov 2016 17:59:12 GMT
Hi!

Today I wanted a new node to join the cluster.
When looking at netstats on all the old nodes, it seemed like the streaming
sessions did complete.
They all said that all files have been transferred. But looking at the
debug.log the stream sessions finished with an error.
Also after all streams have been done the node remains in state "JOINING".

See logs:

debug.log, last words
========
ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 StreamSession.java:593
- [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Streaming error occurred
on session with peer 10.23.71.6
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out
- received only 0 responses.
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:205)
~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
Sending Session Failed
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 StreamSession.java:472
- [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Finishing keep-alive task.
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
Closing stream connection handler on /10.23.71.6
INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
StreamResultFuture.java:187 - [Stream
#b998aec0-b2fd-11e6-a63d-75828fa8d45c] Session with /10.23.71.6 is complete
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
StreamCoordinator.java:146 - Finished connecting all sessions
WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
StreamResultFuture.java:214 - [Stream
#b998aec0-b2fd-11e6-a63d-75828fa8d45c] Stream failed
ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error while
waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed
at
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
~[guava-18.0.jar:na]
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:667)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:598)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
[apache-cassandra-3.10.jar:3.10]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251)
~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,731
StorageService.java:1483 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
[guava-18.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251)
[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]

netstat output of 10.23.71.6, the node mentioned above in the debug.log,
obviously all files + bytes have been transferred
==========
Mode: NORMAL
Bootstrap b998aec0-b2fd-11e6-a63d-75828fa8d45c
/10.23.71.8
Sending 1598 files, 60610896516 bytes total. Already sent 1598 files,
60610896516 bytes total
/var/lib/cassandra/data/log/log_fake-b130c05070e611e6986e29a4f0eae2e7/mc-97218-big-Data.db
29425392/29425392 bytes(100%) sent to idx:0/10.23.71.8
...

nodetool info on new node says
==========
ID                     : 9dedcc9a-d951-4c7a-b794-434db1af960f
Gossip active          : true
Thrift active          : true
Native Transport active: true
Load                   : 185.2 GiB
Generation No          : 1480071319
Uptime (seconds)       : 25082
Heap Memory (MB)       : 3607.60 / 15974.44
Off Heap Memory (MB)   : 405.70
Data Center            : DC1
Rack                   : RAC1
Exceptions             : 0
Key Cache              : entries 1312994, size 100 MiB, capacity 100 MiB,
126413327 hits, 162672698 requests, 0.777 recent hit rate, 14400 save
period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits,
0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits,
0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache            : entries 7680, size 480 MiB, capacity 480 MiB,
8277584 misses, 147262566 requests, 0.944 recent hit rate, 835.412
microseconds miss latency
Percent Repaired       : 4.232661592656687%
Token                  : (node is not joined to the cluster)

Any idea whats going wrong?

Same situation was when I bootstrapped a node last time. In the end I just
started the node with auto_bootstrap=false to get it up and running and I
ran repair afterwards. I'd like to avoid that repair and all the
inconsistencies this time.

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Mime
View raw message