cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Baronti <>
Subject Re: OOM recovering failed node with many CFs
Date Thu, 26 May 2011 16:36:29 GMT
I tried the manual copy you suggest, but the SystemTable.checkHealth() function
complains it can't load the system files. Log follows, I will gather some more
info and create a ticket as soon as possible.

  INFO [main] 2011-05-26 18:25:36,147 Logging initialized
  INFO [main] 2011-05-26 18:25:36,172 Heap size: 4277534720/4277534720
  INFO [main] 2011-05-26 18:25:36,174 JNA not found. Native methods will be
  INFO [main] 2011-05-26 18:25:36,190 Loading settings from 
  INFO [main] 2011-05-26 18:25:36,344 DiskAccessMode 'auto' determined
to be mmap, 
indexAccessMode is mmap
  INFO [main] 2011-05-26 18:25:36,532 Opening G:\Cassandra\data\system\Schema-f-2746
  INFO [main] 2011-05-26 18:25:36,577 Opening G:\Cassandra\data\system\Schema-f-2729
  INFO [main] 2011-05-26 18:25:36,590 Opening G:\Cassandra\data\system\Schema-f-2745
  INFO [main] 2011-05-26 18:25:36,599 Opening G:\Cassandra\data\system\Migrations-f-2167
  INFO [main] 2011-05-26 18:25:36,600 Opening G:\Cassandra\data\system\Migrations-f-2131
  INFO [main] 2011-05-26 18:25:36,602 Opening G:\Cassandra\data\system\Migrations-f-1041
  INFO [main] 2011-05-26 18:25:36,603 Opening G:\Cassandra\data\system\Migrations-f-1695
ERROR [main] 2011-05-26 18:25:36,634 Fatal exception during initialization
org.apache.cassandra.config.ConfigurationException: Found system table files, but they couldn't
be loaded. Did you 
change the partitioner?
	at org.apache.cassandra.db.SystemTable.checkHealth(
	at org.apache.cassandra.service.AbstractCassandraDaemon.setup(

	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(
	at org.apache.cassandra.thrift.CassandraDaemon.main(

Il 5/26/2011 6:04 PM, Jonathan Ellis ha scritto:
> Sounds like a legitimate bug, although looking through the code I'm
> not sure what would cause a tight retry loop on migration
> announce/rectify. Can you create a ticket at
> ?
> As a workaround, I would try manually copying the Migrations and
> Schema sstable files from the system keyspace of the live node, then
> restart the recovering one.
> On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti
> <>  wrote:
>> I can't seem to be able to recover a failed node on a database where i did
>> many updates to the schema.
>> I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but
>> it can't be changed right now), and ReplicationFactor=2.
>> I shut down a node and cleaned its data entirely, then tried to bring it
>> back up. The node starts fetching schema updates from the live node, but the
>> operation fails halfway with an OOME.
>> After some investigation, what I found is that:
>> - I have a lot of schema updates (there are 2067 rows in the system.Schema
>> CF).
>> - The live node loads migrations 1-1000, and sends them to the recovering
>> node (Migration.getLocalMigrations())
>> - Soon afterwards, the live node checks the schema version on the recovering
>> node and finds it has moved by a little - say it has applied the first 3
>> migrations. It then loads migrations 3-1003, and sends them to the node.
>> - This process is repeated very quickly (sends migrations 6-1006, 9-1009,
>> etc).
>> Analyzing the memory dump and the logs, it looks like each of these 1000
>> migration blocks are composed in a single message and sent to the
>> OutboundTcpConnection queue. However, since the schema is big, the messages
>> occupy a lot of space, and are built faster than the connection can send
>> them. Therefore, they accumulate in OutboundTcpConnection.queue, until
>> memory is completely filled.
>> Any suggestions? Can I change something to make this work, apart from
>> reducing the number of CFs?
>> Flavio

View raw message