cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Couldn't detect any schema definitions in local storage - after handling schema disagreement according to FAQ
Date Mon, 21 May 2012 22:16:31 GMT
> 1) What did I wrong? - why cassandra was throwing exceptions on first startup?
In 1.0.X the history of schema changes was replayed to the node when it rejoined the cluster.
If the node is receiving traffic while this is going on it will log those errors until the
schema mutation that created 1012 is replayed. 

> 2) Why the keyspace data was invalidated ? Is it expected?
The data will have remained on the disk. The load is calculated based on the CF's in the schema,
this can mean that the load will not return to full until the schema is fully replayed. 

Did you lose data ?

> 3) If answer to #2 is  "yes it's expected" then  that's the point in doing http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> then all keyspace data is lost anyway? It makes more sense to just do http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node

Answer as no. 

Checking, did you delete just the Schema-* and Migration-* files or all of the files in data/system?

Also in the first log there is a log of commit log mutation being skipped because the schema
is not there. Drain should have removed these, but it can take a little time (I think).  

> 4) afaiu i could also stop cassandra again move old sstables from snapshot back to keyspace
data dir and run repair for all keyspace CFs? So that it finishes faster
> and makes less load than running a repair which has no previous keyspace data at all?

The approach you followed was the correct one. 

I've updated the wiki to say the errors are expected. 

Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/05/2012, at 6:34 AM, Piavlo wrote:

> Hi,
> 
> I had a schema disagreement problem in cassandra 1.0.9 cluster, where one node had different
schema version.
> So I followed the faq at http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> disabled gossip, disabled thrift, drained  and finally stopped the cassandra process,
on startup
> noticed
> INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) Couldn't detect
any schema definitions in local storage.
> in the log, and after
> INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) Bootstrap/Replace/Move
completed! Now serving reads.
> it started throwing Fatal exceptions for all read/write operations endlessly.
> 
> I had to stop cassandra process again(no draining was done)
> 
> On second start it did came up ok immediately loading the correct cluster schema version
> INFO [main] 2012-05-18 16:54:44,303 DatabaseDescriptor.java (line 499) Loading schema
version 9db34ef0-a0be-11e1-0000-f9687e034cf7
> 
> But now this node appears to have started with no data from keyspace which had schema
disagreement.
> The original keyspace sstables now appear under snapshots dir.
> 
> # nodetool -h localhost ring
> Address         DC          Rack        Status State   Load            Owns    Token
>                                                                               141784319550391026443072753096570088106
> 10.49.127.4     eu-west     1a          Up     Normal  8.19 GB         16.67%  0
> 10.241.29.65    eu-west     1b          Up     Normal  8.18 GB         16.67%  28356863910078205288614550619314017621
> 10.59.46.236    eu-west     1c          Up     Normal  8.22 GB         16.67%  56713727820156410577229101238628035242
> 10.50.33.232    eu-west     1a          Up     Normal  8.2 GB          16.67%  85070591730234615865843651857942052864
> 10.234.71.33    eu-west     1b          Up     Normal  8.15 GB         16.67%  113427455640312821154458202477256070485
> 10.58.249.118   eu-west     1c          Up     Normal  660.98 MB       16.67%  141784319550391026443072753096570088106
> #
> 
> The node is the one with 660.98 MB data( which is opscenter keyspace data which was not
invalidated)
> 
> So i have some questions:
> 
> 1) What did I wrong? - why cassandra was throwing exceptions on first startup?
> 2) Why the keyspace data was invalidated ? Is it expected?
> 3) If answer to #2 is  "yes it's expected" then  that's the point in doing http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> then all keyspace data is lost anyway? It makes more sense to just do http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
> 4) afaiu i could also stop cassandra again move old sstables from snapshot back to keyspace
data dir and run repair for all keyspace CFs? So that it finishes faster
> and makes less load than running a repair which has no previous keyspace data at all?
> 
> The first startup log is below:
> 
> INFO [main] 2012-05-18 16:23:07,367 AbstractCassandraDaemon.java (line 105) Logging initialized
> INFO [main] 2012-05-18 16:23:07,382 AbstractCassandraDaemon.java (line 126) JVM vendor/version:
Java HotSpot(TM) 64-Bit Server VM/1.6.0_24
> INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 127) Heap size:
2600468480/2600468480
> INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 128) Classpath:
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/java/mx4j-tools.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-1.0.9.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra//lib/jamm-0.2.5.jar
> INFO [main] 2012-05-18 16:23:10,661 CLibrary.java (line 109) JNA mlockall successful
> INFO [main] 2012-05-18 16:23:10,692 DatabaseDescriptor.java (line 114) Loading settings
from file:/etc/cassandra/ssa/cassandra.yaml
> INFO [main] 2012-05-18 16:23:10,868 DatabaseDescriptor.java (line 168) DiskAccessMode
'auto' determined to be mmap, indexAccessMode is mmap
> INFO [main] 2012-05-18 16:23:10,911 DatabaseDescriptor.java (line 233) Global memtable
threshold is enabled at 826MB
> INFO [main] 2012-05-18 16:23:10,970 Ec2Snitch.java (line 63) EC2Snitch using region:
eu-west, zone: 1c.
> INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,652 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/Versions-hc-5 (247 bytes)
> INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,767 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/IndexInfo-hc-2 (128 bytes)
> INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,768 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/IndexInfo-hc-3 (124 bytes)
> INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,802 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/IndexInfo-hc-1 (124 bytes)
> INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,825 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/LocationInfo-hc-35 (89 bytes)
> INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,825 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/LocationInfo-hc-34 (558 bytes)
> INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,835 SSTableReader.java (line 153) Opening
/var/lib/cassandra/data/system/HintsColumnFamily-hc-56 (235748 bytes)
> INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) Couldn't detect
any schema definitions in local storage.
> INFO [main] 2012-05-18 16:23:11,880 DatabaseDescriptor.java (line 493) Found table data
in data directories. Consider using the CLI to define your schema.
> INFO [main] 2012-05-18 16:23:11,914 CommitLogSegment.java (line 60) Creating new commitlog
segment /var/lib/cassandra/commitlog/CommitLog-1337358191914.log
> INFO [main] 2012-05-18 16:23:11,928 CommitLog.java (line 172) Replaying /var/lib/cassandra/commitlog/CommitLog-1337356780099.log
> INFO [main] 2012-05-18 16:23:15,237 CommitLog.java (line 355) Finished reading /var/lib/cassandra/commitlog/CommitLog-1337356780099.log
> INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 77886 mutations
from unknown (probably removed) CF with id 1016
> INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 37804 mutations
from unknown (probably removed) CF with id 1008
> INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 3 mutations from
unknown (probably removed) CF with id 1024
> INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 67759 mutations
from unknown (probably removed) CF with id 1009
> INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 51268 mutations
from unknown (probably removed) CF with id 1010
> INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 59319 mutations
from unknown (probably removed) CF with id 1011
> INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 61110 mutations
from unknown (probably removed) CF with id 1012
> INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 182908 mutations
from unknown (probably removed) CF with id 1013
> INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 10038 mutations
from unknown (probably removed) CF with id 1014
> INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 14783 mutations
from unknown (probably removed) CF with id 1001
> INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 4256 mutations
from unknown (probably removed) CF with id 1000
> INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 746 mutations from
unknown (probably removed) CF with id 1003
> INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 360) Skipped 3013 mutations
from unknown (probably removed) CF with id 1002
> INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 179) Log replay complete, 0
replayed mutations
> INFO [main] 2012-05-18 16:23:15,259 StorageService.java (line 400) Cassandra version:
1.0.9
> INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 401) Thrift API version:
19.20.0
> INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 414) Loading persisted
ring state
> INFO [main] 2012-05-18 16:23:15,288 StorageService.java (line 494) Starting up server
gossip
> INFO [main] 2012-05-18 16:23:15,292 ColumnFamilyStore.java (line 705) Enqueuing flush
of Memtable-LocationInfo@189237805(29/36 serialized/live bytes, 1 ops)
> INFO [FlushWriter:1] 2012-05-18 16:23:15,293 Memtable.java (line 246) Writing Memtable-LocationInfo@189237805(29/36
serialized/live bytes, 1 ops)
> INFO [FlushWriter:1] 2012-05-18 16:23:15,350 Memtable.java (line 283) Completed flushing
/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db (80 bytes)
> INFO [main] 2012-05-18 16:23:15,365 Ec2Snitch.java (line 113) Ec2Snitch adding ApplicationState
ec2region=eu-west ec2zone=1c
> INFO [main] 2012-05-18 16:23:15,379 MessagingService.java (line 269) Starting Messaging
Service on port 7000
> INFO [main] 2012-05-18 16:23:15,399 StorageService.java (line 609) Using saved token
141784319550391026443072753096570088106
> INFO [main] 2012-05-18 16:23:15,403 ColumnFamilyStore.java (line 705) Enqueuing flush
of Memtable-LocationInfo@559001765(53/66 serialized/live bytes, 2 ops)
> INFO [FlushWriter:1] 2012-05-18 16:23:15,404 Memtable.java (line 246) Writing Memtable-LocationInfo@559001765(53/66
serialized/live bytes, 2 ops)
> INFO [FlushWriter:1] 2012-05-18 16:23:15,451 Memtable.java (line 283) Completed flushing
/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db (163 bytes)
> INFO [main] 2012-05-18 16:23:15,454 StorageService.java (line 989) Node dsc2c.internal/10.58.249.118
state jump to normal
> INFO [CompactionExecutor:1] 2012-05-18 16:23:15,461 CompactionTask.java (line 115) Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-35-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-34-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db')]
> INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) Bootstrap/Replace/Move
completed! Now serving reads.
> INFO [CompactionExecutor:1] 2012-05-18 16:23:15,507 CompactionTask.java (line 223) Compacted
to [/var/lib/cassandra/data/system/LocationInfo-hc-38-Data.db,].  890 to 558 (~62% of original)
bytes for 4 keys at 0.015651MB/s.  Time: 34ms.
> ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java (line 61)
Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012
>        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
>        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
>        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
>        at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [MutationStage:2] 2012-05-18 16:23:15,666 RowMutationVerbHandler.java (line 61)
Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1010
>        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
>        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
>        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
>        at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [ReadStage:1] 2012-05-18 16:23:15,668 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[ReadStage:1,5,main]
> java.lang.IllegalArgumentException: Unknown ColumnFamily UserData in keyspace PRODUCTION
>        at org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
>        at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320)
>        at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
>        at org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>        at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
>        at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
>        at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
>        at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [ReadStage:3] 2012-05-18 16:23:15,669 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[ReadStage:3,5,main]
> java.lang.IllegalArgumentException: Unknown ColumnFamily UserFraudParameters in keyspace
PRODUCTION
>        at org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
>        at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320)
>        at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
>        at org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>        at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
>        at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
>        at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
>        at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ...
> and so endlessly on
> 
> 
> Thanks
> Alex
> 
> 
> 


Mime
View raw message