incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Sargisson <edward.sargis...@globalrelay.net>
Subject Re: Losing keyspace on cassandra upgrade
Date Wed, 19 Sep 2012 15:25:24 GMT
We've seen that before too - supposedly it was fixed in 1.1.5. Your 
experience casts some doubt on that.

Our workaround, thus far, is to shut down the entire ring and then bring 
each node back up starting with known good.
Then you do nodetool resetlocalschema on the node that's confused and 
make sure it gets the schema linked up properly.
Then nodetool repair.

I see you've done that but we found a complete ring restart was 
necessary. This was on Cass 1.1.1.

Cheers,
Edward

On 12-09-19 08:12 AM, Michael Kjellman wrote:
> Sounds like you are loosing your system keyspace. When you say nothing important changed
between yaml files do you mean with or without your changes?
>
> Did your data directories change in the migration? Permissions okay?
>
> I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..
>
> On Sep 19, 2012, at 7:44 AM, "Thomas Stets" <thomas.stets@gmail.com> wrote:
>
>> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5
>>
>> I have the same cassandra keyspace on all our staging systems:
>>
>> development:  a 3-node cluster
>> integration: a 3-node cluster
>> QS: a 2-node cluster
>> (productive will be a 4-node cluster, which is as yet not active)
>>
>> All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade
to the
>> latest productive version of cassandra.
>>
>> In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
>> On the development system I didn't realize at first what was happening. I just wondered
that nodetool
>> showed a very low amount of data. On integration I saw the problem quickly, but could
not recover the
>> data. I re-installed the cassandra cluster from scratch, and populated it with our
test data, so our
>> developers could work.
>>
>> I am currently using the QS system to recreate the problem and try to find what I
am doing wrong,
>> and how I can avoid losing productive data once we are live.
>>
>> Basically I was doing the following:
>>
>> 1. create a snapshot on every node
>> 2. create a tar.gz of my data directory, just to be safe
>> 3. shut down and re-start cassandra 1.1.1 (just to see that it is not the re-start
that is creating the problem)
>> 4. verify that the keyspace is still known, and the data present.
>> 5. shut down cassandra 1.1.1
>> 6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the new
one first, to see whether anything important has changed)
>> 7. start cassandra 1.1.5
>>
>> In the log file, after the "Replaying ..." messages I find the following:
>>
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 759
mutations from unknown (probably removed) CF with id 1187
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 606
mutations from unknown (probably removed) CF with id 1186
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 53
mutations from unknown (probably removed) CF with id 1185
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945
mutations from unknown (probably removed) CF with id 1184
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945
mutations from unknown (probably removed) CF with id 1191
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 7506
mutations from unknown (probably removed) CF with id 1190
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 88
mutations from unknown (probably removed) CF with id 1189
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87
mutations from unknown (probably removed) CF with id 1188
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 354
mutations from unknown (probably removed) CF with id 1195
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87
mutations from unknown (probably removed) CF with id 1194
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 45
mutations from unknown (probably removed) CF with id 1192
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 82
mutations from unknown (probably removed) CF with id 1197
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 46386
mutations from unknown (probably removed) CF with id 1177
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 69
mutations from unknown (probably removed) CF with id 1178
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 73
mutations from unknown (probably removed) CF with id 1179
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 88
mutations from unknown (probably removed) CF with id 1181
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 46386
mutations from unknown (probably removed) CF with id 1182
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 7506
mutations from unknown (probably removed) CF with id 1183
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay complete,
0 replayed mutations
>>
>> This is the first obvious indication something is wrong. Going further up in the
log file I discover that the SSTableReader logs only system keyspace files.
>>
>> Currently my cluster is in the folloing state:
>>
>> node 1 runs cassandra 1.1.5, and doesn't know my keyspace
>> node 2 runs cassandra 1.1.1, and still nows my keyspace.
>>
>> nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. The
cluster itself is still intact, i.e. nodetool ring shows both nodes.
>>
>> I tried a nodetool resetlocalschema, and nodetool repair, but that didn't change
anything.
>>
>> Any idea what I have been doing wrong (the preferred solution), or whether I stumbled
over a cassandra bug (not so nice)?
>>
>>
>>    TIA, Thomas
> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks
solutions.
> Visit http://barracudanetworks.com/facebook
>
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <mailto:edward.sargisson@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.


Mime
View raw message