cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted
Date Thu, 19 May 2016 23:06:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292309#comment-15292309
] 

Joel Knighton commented on CASSANDRA-11742:
-------------------------------------------

I think this second patch is an improvement - I traced this issue to determine exactly why
it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] which centralized Cassandra
startup checks. Prior to this change, we inserted cluster name directly after checking the
health of the system keyspace, so if an sstable for the system keyspace was flushed, we could
guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we insert cluster
name with the rest of the local metadata in {{SystemKeyspace.finishStartup}}.

[~beobal] - I couldn't find a reason for the change as to when cluster name is inserted other
than that it didn't seem like a good idea to mutate anything in a startup check. Can you think
of any reason we can't just call {{SystemKeyspace.persistLocalMetadata}} immediately after
snapshotting the system keyspace in {{CassandraDaemon}}? The root cause of this problem is
that we need the data persisted before any truncate/schema logic, since these will write to
the system keyspace, so we can have flushed sstables with this data but no sstable with cluster
name, which breaks the logic of the system keyspace health check. I ran full unit tests/dtests
on a branch that moved {{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot
of the system keyspace and the results looked good.

> Failed bootstrap results in exception when node is restarted
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-11742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.x
>
>         Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a {{org.apache.cassandra.exceptions.ConfigurationException:
Found system keyspace files, but they couldn't be loaded!}} exception when the node is restarted.
This did not happen in 2.1, it just tried to bootstrap again. I know that the workaround is
relatively easy, just delete the system keyspace in the data folder on disk and try again,
but its a bit annoying that you have to do that.
> The problem seems to be that the creation of the {{system.local}} table has been moved
to just before the bootstrap begins (in 2.1 it was done much earlier) and as a result its
still in the memtable och commitlog if the bootstrap failes. Still a few values is inserted
to the {{system.local}} table at an earlier point in the startup and they have been flushed
from the memtable to an sstable. When the node is restarted the {{SystemKeyspace.checkHealth()}}
is executed before the commitlog is replayed and therefore only see the sstable with an incomplete
{{system.local}} table and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in the {{StorageServiceShutdownHook}},
I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message