cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Kania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11273) Exceptions during bootstrap cause bootstrap to hang (WORKAROUND)
Date Mon, 29 Feb 2016 17:01:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172145#comment-15172145
] 

Jason Kania commented on CASSANDRA-11273:
-----------------------------------------

The logs in the above Description are the errors that I saw during bootstrapping of new node
192.168.10.10. Node 192.168.10.8 is a working node in the cluster and not new. I ran nodetool
repair on 192.168.10.8 without error prior to bootstrapping 192.168.10.10. If you look at
the logs following the text "from 192.168.10.10" in the initial description text above, the
errors there are what was seen during bootstrap. Previous to these logs, there were no other
error logs. To work around, I did the steps highlighted following "Possible Workaround"
 in the above Description as I was unable to get Cassandra to bootstrap according automatically.

> Exceptions during bootstrap cause bootstrap to hang (WORKAROUND)
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11273
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11273
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Lifecycle
>         Environment: debian jesse patch current running Cassandra 3.0.3
>            Reporter: Jason Kania
>
> When running bootstrap on a new node, the following problem can occur because Cassandra
fails to recognize columns for some reason. The error prevents the bootstrap from finishing
and hangs the bootstrap. If the bootstrap is resumed, it will get the same error and bootstrap
cannot be completed. The workaround that I used is at the end.
> from 192.168.10.8
> ERROR [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamSession.java:635 - [Stream
#c9868f90-ddbb-11e5-80c0-89f591237aca] Remote peer 192.168.10.10 failed stream session.
> INFO  [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamResultFuture.java:182
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Session with /192.168.10.10 is complete
> WARN  [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,858 StreamResultFuture.java:209
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Stream failed
> from 192.168.10.8 debug
> DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,414 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received Received (79256340-bbbb-11e5-9f70-7d76a8de8480,
#0)
> DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,854 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received Retry (f3a137e0-024b-11e5-bb31-0d2316086bf7,
#0)
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854 ConnectionHandler.java:334
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Sending File (Header (cfId: f3a137e0-024b-11e5-bb31-0d2316086bf7,
#0, version: ma, format: BIG, estimated keys: 128, transfer size: 4653, compressed?: true,
repairedAt: 0, level: 0), file: /home/cassandra/data/sensordb/sensor/ma-76-big-Data.db)
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854 CompressedStreamWriter.java:63
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Start streaming file /home/cassandra/data/sensordb/sensor/ma-76-big-Data.db
to /192.168.10.10, repairedAt = 0, totalSize = 4653
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854 CompressedStreamWriter.java:94
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Finished streaming file /home/cassandra/data/sensordb/sensor/ma-76-big-Data.db
to /192.168.10.10, bytesTransferred = 4653, totalSize = 4653
> DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,855 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received Retry (faa55490-024b-11e5-bb31-0d2316086bf7,
#0)
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,855 ConnectionHandler.java:334
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Sending File (Header (cfId: faa55490-024b-11e5-bb31-0d2316086bf7,
#0, version: ma, format: BIG, estimated keys: 128, transfer size: 705, compressed?: true,
repairedAt: 0, level: 0), file: /home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db)
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,856 CompressedStreamWriter.java:63
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Start streaming file /home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db
to /192.168.10.10, repairedAt = 0, totalSize = 705
> DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,856 CompressedStreamWriter.java:94
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Finished streaming file /home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db
to /192.168.10.10, bytesTransferred = 705, totalSize = 705
> DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received Session Failed
> ERROR [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamSession.java:635 - [Stream
#c9868f90-ddbb-11e5-80c0-89f591237aca] Remote peer 192.168.10.10 failed stream session.
> DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 ConnectionHandler.java:110 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Closing stream connection handler on /192.168.10.10
> INFO  [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamResultFuture.java:182
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Session with /192.168.10.10 is complete
> WARN  [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,858 StreamResultFuture.java:209
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Stream failed
> from 192.168.10.10
> [2016-02-27 20:37:53,413] received file /home/cassandra/data/sensordb/listedAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db
(progress: 365%)
> [2016-02-27 20:37:53,414] received file /home/cassandra/data/sensordb/liestedAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db
(progress: 369%)
> [2016-02-27 20:37:53,865] session with /192.168.10.8 complete (progress: 369%)
> [2016-02-27 20:37:53,866] Stream failed
> from 192.168.10.10 debug
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,201 CompressedStreamReader.java:80
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Start receiving file #0 from /192.168.10.8,
repairedAt = 0, size = 166627, ks = 'sensordb', table = 'listAttributes'.
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,412 CompressedStreamReader.java:110
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Finished receiving file #0 from /192.168.10.8
readBytes = 166627, totalSize = 166627
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,412 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received File (Header (cfId: 79256340-bbbb-11e5-9f70-7d76a8de8480,
#0, version: ma, format: BIG, estimated keys: 128, transfer size: 166627, compressed?: true,
repairedAt: 0, level: 0), file: /home/cassandra/data/sensordb/listAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db)
> DEBUG [STREAM-OUT-/192.168.10.8] 2016-02-27 20:37:53,412 ConnectionHandler.java:334 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Sending Received (79256340-bbbb-11e5-9f70-7d76a8de8480,
#0)
> DEBUG [CompactionExecutor:3] 2016-02-27 20:37:53,833 CompactionTask.java:217 - Compacted
(e224bef0-ddbb-11e5-80c0-89f591237aca) 4 sstables to [/home/cassandra/data/system_distributed/parent_repair_history-deabd734b99d3b9c92e5fd92eb5abf14/ma-5-big,]
to level=0.  2,743,164 bytes to 685,791 (~25% of original) in 1,096ms = 0.596735MB/s.  0 total
partitions merged to 57.  Partition merge counts were {4:57, }
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,850 CompressedStreamReader.java:80
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Start receiving file #0 from /192.168.10.8,
repairedAt = 0, size = 4653, ks = 'sensordb', table = 'sensor'.
> WARN  [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,851 StreamSession.java:641 - [Stream
#c9868f90-ddbb-11e5-80c0-89f591237aca] Retrying for following error
> java.lang.RuntimeException: Unknown column lastEvaluation during deserialization
>         at org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
~[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:87)
~[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:50)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
> DEBUG [STREAM-OUT-/192.168.10.8] 2016-02-27 20:37:53,852 ConnectionHandler.java:334 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Sending Retry (f3a137e0-024b-11e5-bb31-0d2316086bf7,
#0)
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,852 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received null
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,853 CompressedStreamReader.java:80
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Start receiving file #0 from /192.168.10.8,
repairedAt = 0, size = 705, ks = 'sensordb', table = 'sensorUnit'.
> WARN  [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,854 StreamSession.java:641 - [Stream
#c9868f90-ddbb-11e5-80c0-89f591237aca] Retrying for following error
> java.lang.RuntimeException: Unknown column lastCheckTime during deserialization
>         at org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
~[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:87)
~[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:50)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
[apache-cassandra-3.0.3.jar:3.0.3]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
> DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,854 ConnectionHandler.java:262 -
[Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Received null
> Possible Workaround
> To resolve this, it is possible to do the following:
> 1) in cqlsh on the new node
> use system;
> select host_id from local
> 2) Save that host_id uuid for later use
> 3) Change the cassandra.yaml to set auto_bootstrap to false
> 4) Stop the database on the new node
> 5) Remove all the contents of the data directory on the new node
> 6) Copy all files from the data directory on an existing replica node to the data directory
on new node
> 7) Start Cassandra on the new node in network isolation or restart Cassandra on the other
nodes in the cluster after starting the new node
> 8) In cqlsh on the new node
> use system;
> update local set host_id=<host id saved previously>,tokens=null where key='local';
> update local set broadcast_address='<local IP>',listen_address='<local IP>',rpc_address='<local
IP>' where key='local';
> 9) On the new node run the following to save the updated system data
> nodetool flush system local
> 10) Restart cassandra on the new node
> 11) Run the following on the new node to generate the data tokens
> nodetool repair



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message