cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jeff.ji...@crowdstrike.com>
Subject Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors
Date Tue, 11 Oct 2016 00:00:18 GMT
 

No need to cc dev@, user@ is the right list for this question.

 

As Jon mentioned, you can’t stream (bootstrap/rebuild/repair) across major versions, so
don’t try to destroy the cluster – just upgrade in place. It IS a good idea to do one
DC at a time, but an in-place upgrade is pretty straightforward – flush, drain, stop Cassandra,
replace binaries, start Cassandra, run nodetool upgradesstables -a.

 

Note that you can run nodetool upgradesstables on more than one node at a time if you can
tolerate the hit to your read latencies.

 

It IS common, I imagine, for there to be schema mismatches temporarily while you have a mixed
version cluster – this isn’t necessarily a huge problem, but do try to get to 3.0.8 as
quickly as possible once you start, and if you can avoid administrative tasks (such as those
that will change the schema) during the process, that’s generally advisable.

 

 

 

 

From: Abhishek Verma <verma@uber.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 10, 2016 at 4:34 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>, "dev@cassandra.apache.org"
<dev@cassandra.apache.org>
Subject: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because
of streaming errors

 

Hi Cassandra users, 

 

We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8 (running on Mesos, but
that's besides the point). We have two datacenters, so in order to preserve our data, we are
trying to upgrade one datacenter at a time. 

 

Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear down dc1 completely
(delete all the data in it), bring it up with 3.0.8, let data replicate from dc2 to dc1, and
then tear down dc2, bring it up with 3.0.8 and replicate data from dc1.

 

I am able to reproduce the problem on bare metal clusters running on 3 nodes. I am using Oracle's
server-jre-8u74-linux-x64 JRE.

 

Node A: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its own IP address, changed
listen_address and rpc_address to its own IP and changed endpoint_snitch to GossipingPropertyFileSnitch.
I changed conf/cassandra-rackdc.properties to

dc=dc2

rack=rack2

This node started up fine and is UN in nodetool status in dc2.

 

I used CQL shell to create a table and insert 3 rows:

verma@xxxxx:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME

Connected to Test Cluster at xxxxx:9042.

[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]

Use HELP for help.

cqlsh> desc tmp

 

CREATE KEYSPACE tmp WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1', 'dc2':
'1'}  AND durable_writes = true;

 

CREATE TABLE tmp.map (

    key text PRIMARY KEY,

    value text

)...;

cqlsh> select * from tmp.map;

 

 key | value

-----+-------

  k1 |    v1

  k3 |    v3

  k2 |    v2

 

 

Node B: Downloaded 3.0.8-bin.tar.gz, changed the seeds to include itself and node A, changed
listen_address and rpc_address to its own IP, changed endpoint_snitch to GossipingPropertyFileSnitch.
I did not change conf/cassandra-rackdc.properties and its contents are

dc=dc1

rack=rack1

 

In the logs, I see:

INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting Messaging Service
on /10.164.32.29:7000 (eth0)

INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node will not auto bootstrap
because it is configured to be a seed node.

 

So I start a third node:

Node C: Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node A and node B, changed
listen_address and rpc_address to its own IP, changed endpoint_snitch to GossipingPropertyFileSnitch.
I did not change conf/cassandra-rackdc.properties.

Now, nodetool status shows:

 

verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool status

Datacenter: dc1

===============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address       Load       Tokens       Owns (effective)  Host ID                      
        Rack

UJ  <Node C IP>   87.81 KB   256          ?                 9064832d-ed5c-4c42-ad5a-f754b52b670c
 rack1

UN  <Node B IP>  107.72 KB  256          100.0%            28b1043f-115b-46a5-b6b6-8609829cde76
 rack1

Datacenter: dc2

===============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address       Load       Tokens       Owns (effective)  Host ID                      
        Rack

UN  <Node A IP>    73.2 KB    256          100.0%            09cc542c-2299-45a5-a4d1-159c239ded37
 rack2

 

Nodetool describe cluster shows:

verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster

Cluster Information:

Name: Test Cluster

Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch

Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Schema versions:

c2a2bb4f-7d31-3fb8-a216-00b41a643650: [<Node B IP>, <Node C IP>]

 

9770e3c5-3135-32e2-b761-65a0f6d8824e: [<Node A IP>]

 

Note that there are two schema versions and they don't match.

 

I see the following in the system.log: 

 

INFO  [InternalResponseStage:1] 2016-10-10 22:48:36,055 ColumnFamilyStore.java:390 - Initializing
system_auth.roles

INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: waiting for schema
information to complete

INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: schema complete,
ready to bootstrap

INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: waiting for pending
range calculation

INFO  [main] 2016-10-10 22:48:36,317 StorageService.java:1149 - JOINING: calculation complete,
ready to bootstrap

INFO  [main] 2016-10-10 22:48:36,319 StorageService.java:1149 - JOINING: getting bootstrap
token

INFO  [main] 2016-10-10 22:48:36,357 StorageService.java:1149 - JOINING: sleeping 30000 ms
for pending range setup

INFO  [main] 2016-10-10 22:49:06,358 StorageService.java:1149 - JOINING: Starting to bootstrap...

INFO  [main] 2016-10-10 22:49:06,494 StreamResultFuture.java:87 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Executing streaming plan for Bootstrap

INFO  [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,495 StreamSession.java:242 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Starting streaming to /<Node A IP>

INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,495 StreamSession.java:242 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Starting streaming to /<Node B IP>

INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,500 StreamCoordinator.java:213 -
[Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e, ID#0] Beginning stream session with /<Node
B IP>

INFO  [STREAM-IN-/<Node B IP>] 2016-10-10 22:49:06,590 StreamResultFuture.java:183 -
[Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node B IP> is complete

INFO  [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,635 StreamCoordinator.java:213 -
[Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e, ID#0] Beginning stream session with /<Node
A IP>

ERROR [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639 StreamSession.java:528 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Streaming error occurred

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_102]

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_102]

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_102]

at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_102]

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_102]

at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) ~[na:1.8.0_102]

at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_102]

at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_102]

at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
~[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
~[apache-cassandra-3.0.8.jar:3.0.8]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]

INFO  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639 StreamResultFuture.java:183 -
[Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node A IP> is complete

WARN  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640 StreamResultFuture.java:210 -
[Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Stream failed

WARN  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640 StorageService.java:1208 - Error
during bootstrap.

org.apache.cassandra.streaming.StreamException: Stream failed

at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.8.jar:3.0.8]

at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na]

at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
[guava-18.0.jar:na]

at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
[guava-18.0.jar:na]

at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na]

at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
[guava-18.0.jar:na]

at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305)
[apache-cassandra-3.0.8.jar:3.0.8]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]

ERROR [main] 2016-10-10 22:49:06,641 StorageService.java:1218 - Error while waiting on bootstrap
to complete. Bootstrap will have to be restarted.

java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream
failed

at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]

at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1213) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:889) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:663) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:528) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:339) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557) [apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685) [apache-cassandra-3.0.8.jar:3.0.8]

Caused by: org.apache.cassandra.streaming.StreamException: Stream failed

at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.8.jar:3.0.8]

at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]

at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
~[guava-18.0.jar:na]

at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-18.0.jar:na]

at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]

at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-18.0.jar:na]

at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
~[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
~[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429) ~[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534) ~[apache-cassandra-3.0.8.jar:3.0.8]

at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305)
~[apache-cassandra-3.0.8.jar:3.0.8]

at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]

WARN  [main] 2016-10-10 22:49:06,646 StorageService.java:944 - Some data streaming failed.
Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`.
IN_PROGRESS

INFO  [main] 2016-10-10 22:49:06,647 CassandraDaemon.java:644 - Waiting for gossip to settle
before accepting client requests...

INFO  [main] 2016-10-10 22:49:14,648 CassandraDaemon.java:675 - No gossip backlog; proceeding

INFO  [main] 2016-10-10 22:49:14,694 NativeTransportService.java:70 - Netty using native Epoll
event loop

INFO  [main] 2016-10-10 22:49:14,726 Server.java:159 - Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c,
netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c,
netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c,
netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c,
netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c,
netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]

INFO  [main] 2016-10-10 22:49:14,726 Server.java:160 - Starting listening for CQL clients
on /<Node C IP>:9042 (unencrypted)...

INFO  [main] 2016-10-10 22:49:14,748 CassandraDaemon.java:477 - Not starting RPC server as
requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start
it

 

I tried resuming bootstrap but it fails with the same streaming errors:

 

verma@<Node C>:~/apache-cassandra-3.0.8$ bin/nodetool bootstrap resume

Resuming bootstrap

[2016-10-10 23:15:11,816] session with /<Node B IP> complete (progress: 0%)

[2016-10-10 23:15:11,939] session with /<Node A IP> complete (progress: 0%)

[2016-10-10 23:15:11,940] Stream failed

 

and I see the same error in the system.log: 

 

StreamSession.java:528 - [Stream #64b73a20-8f3f-11e6-b69a-1b451159408e] Streaming error occurred

java.io.IOException: Connection reset by peer

...

 

Does Cassandra support upgrading from 2.2.5 to 3.0.8 in this way? Am I missing something?


 

Thanks for your time.

-Abhishek.

____________________________________________________________________
CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may be legally
privileged. If you are not the intended recipient, do not disclose, copy, distribute, or use
this email or any attachments. If you have received this in error please let the sender know
and then delete the email and all attachments.

Mime
View raw message