cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Byrd (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6797) compaction and scrub data directories race on startup
Date Tue, 04 Mar 2014 02:10:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918938#comment-13918938
] 

Matt Byrd commented on CASSANDRA-6797:
--------------------------------------

I think this may be the same or a similar issue, but since the repro is more complicated and
the environment windows, I thought I'd file this ticket also.

> compaction and scrub data directories race on startup
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6797
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6797
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: macos (and linux)
>            Reporter: Matt Byrd
>            Priority: Minor
>              Labels: compaction, concurrency, starting
>
>  
> Hi,  
> On doing a rolling restarting of a 2.0.5 cluster in several environments I'm seeing the
following error:
> {code}
>  INFO [CompactionExecutor:1] 2014-03-03 17:11:07,549 CompactionTask.java (line 115) Compacting
[SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13-Data.db'),
SSTableReader(path='/Users/Matthew/.ccm/compactio
> n_race/node1/data/system/local/system-local-jb-15-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-16-Data.db'),
SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/syst
> em-local-jb-14-Data.db')]
>  INFO [CompactionExecutor:1] 2014-03-03 17:11:07,557 ColumnFamilyStore.java (line 254)
Initializing system_traces.sessions
>  INFO [CompactionExecutor:1] 2014-03-03 17:11:07,560 ColumnFamilyStore.java (line 254)
Initializing system_traces.events
>  WARN [main] 2014-03-03 17:11:07,608 ColumnFamilyStore.java (line 473) Removing orphans
for /Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13: [CompressionInfo.db,
Filter.db, Index.db, TOC.txt, Summary.db, Data.db, Statistics.
> db]
> ERROR [main] 2014-03-03 17:11:07,609 CassandraDaemon.java (line 479) Exception encountered
during startup
> java.lang.AssertionError: attempted to delete non-existing file system-local-jb-13-CompressionInfo.db
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:111)
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106)
>         at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:476)
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552)
>  INFO [CompactionExecutor:1] 2014-03-03 17:11:07,612 CompactionTask.java (line 275) Compacted
4 sstables to [/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-17,].
 10,963 bytes to 5,572 (~50% of original) in 57ms = 0.093226MB/s.  4 total partitions merged
to 1.  Partition merge counts were {4:1, }
> {code}
> Seems like a potential race, since compactions are occurring whilst the existing data
directories are being scrubbed.
> Probably an in progress compaction looks like an incomplete one and results in it being
attempted to be scrubbed whilst in progress. 
> On the attempt to delete in the scrubDataDirectories we discover that it no longer exists,
presumably because it has now been compacted away. 
> This then causes an assertion error and the node fails to start up. 
> Here is a ccm script which just stops and starts a 3 node 2.0.5 cluster repeatedly. 
> It seems to fairly reliably reproduce the problem, in less than ten iterations: 
> {code}
> #!/bin/bash
> ccm create compaction_race -v 2.0.5
> ccm populate -n 3
> ccm start
> for i in $(seq 0 1000); do 
>     echo $i;
>     ccm stop
>     ccm start
>     grep ERR ~/.ccm/compaction_race/*/logs/system.log;
> done
> {code}
>  
> Someone else should probably confirm that this is what is going wrong,  
> however if it is, the solution might be as simple as to disable autocompactions slightly
earlier in CassandraDaemon.setup. 
>  
> Or alternatively if there isn't a good reason why we are first scrubbing the system tables
and then scrubbing all keyspaces (including the system keyspace), you could perhaps just scrub
solely the non system keyspaces on the second scrub.
> Please let me know if there is anything else I can provide.
> Thanks,
> Matt



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message