cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7582) 2.1 multi-dc upgrade errors
Date Mon, 28 Jul 2014 18:28:38 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076540#comment-14076540
] 

Jonathan Ellis commented on CASSANDRA-7582:
-------------------------------------------

I see two actual classes of CL errors:

# Table is dropped and we are replaying stale data that should also have been dropped.  Blocking
startup is the Wrong Solution.
# Hardware problem caused a checksum mismatch.  Blocking startup is the Wrong Solution.

Granted that blocking startup can help prevent user errors during PIT recover, that's an entirely
hypothetical situation today; PIT is only nominally usable.  (Fork the JVM every time a CL
segment finishes?  Yeah.)  So let's not optimize for that at the expense of scenarios we see
frequently.

I think we should roll back 7125 until we can do it right.  Doing it right probably means,
remembering old cfids in 2.1.x, then we can get paranoid about seeing them in the CL for 3.0.
 (Getting paranoid in the same version as we start remembering is bad for obvious reasons.)

> 2.1 multi-dc upgrade errors
> ---------------------------
>
>                 Key: CASSANDRA-7582
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7582
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.1.1
>
>
> Multi-dc upgrade [was working from 2.0 -> 2.1 fairly recently|http://cassci.datastax.com/job/cassandra_upgrade_dtest/55/testReport/upgrade_through_versions_test/TestUpgrade_from_cassandra_2_0_latest_tag_to_cassandra_2_1_HEAD/],
but is currently failing.
> Running upgrade_through_versions_test.py:TestUpgrade_from_cassandra_2_0_HEAD_to_cassandra_2_1_HEAD.bootstrap_multidc_test
I get the following errors when starting 2.1 upgraded from 2.0:
> {code}
> ERROR [main] 2014-07-21 23:54:20,862 CommitLog.java:143 - Commit log replay failed due
to replaying a mutation for a missing table. This error can be ignored by providing -Dcassandra.commitlog.stop_on_missing_tables=false
on the command line
> ERROR [main] 2014-07-21 23:54:20,869 CassandraDaemon.java:474 - Exception encountered
during startup
> java.lang.RuntimeException: org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't
find cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300)
[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457)
[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546)
[main/:na]
> Caused by: org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164)
~[main/:na]
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97)
~[main/:na]
>         at org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:353)
~[main/:na]
>         at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:333)
~[main/:na]
>         at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:365)
~[main/:na]
>         at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:98)
~[main/:na]
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:137) ~[main/:na]
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:115) ~[main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message