cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florent Clairambault (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4481) Commitlog not replayed after restart - data lost
Date Fri, 12 Oct 2012 21:43:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475383#comment-13475383
] 

Florent Clairambault commented on CASSANDRA-4481:
-------------------------------------------------

I doesn't work, it failed again a week ago on a 1.1.5 that was running for a little bit.

First of all, it's a commitLog writing and/or reading issue, so if you flush your data frequently
(every hour and in the stop command of the rc.d's script) you reduce your risk of big data
losses. You can lose days of data if you don't do that. Restarting cassandra and going 2 days
in the past is a very unpleasant situation.

So here is the new process I applied to fix my data (which is in fact restarting from scatch
[except we keep the data]):
- Export the keyspace's schema
{code}
cassandra-cli -k ks >schema.txt <<EOF 
show schema;
exit;
EOF
{code}
- Simplify the export (all CF with key_validation_class in AsciiType, default_validation_class
in UTF8Type for most CF except the one that contains binary data where I used BytesTypes).

I simplify an export like that:
{code}
create column family User
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'AsciiType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and column_metadata = [
    {column_name : 'domain',
    validation_class : UTF8Type,
    index_name : 'User_domain_idx',
    index_type : 0},
    {column_name : 'username',
    validation_class : UTF8Type,
    index_name : 'User_username',
    index_type : 0}]
  and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
{code}

To something like that:
{code}
create column family User
  with column_type = 'Standard'
  and key_validation_class = 'AsciiType'
  and comparator = 'AsciiType'
  and default_validation_class = 'UTF8Type'
  and column_metadata = [
    {column_name : 'domain', validation_class : UTF8Type, index_type : 0},
    {column_name : 'username', validation_class : UTF8Type, index_type : 0}];
{code}

During this simplification process, I discovered that some default_validation_class had incorrect
type, so maybe it comes from that. It seems strange that we could "confuse" cassandra this
way, but this problem is indeed very strange...

- Stop cassandra
- Move the keyspace folder to somewhere else (mkdir backup; mv <ks> backup)
- Start cassandra (Not having a keyspace folder is like not having any data, it's not a problem).
- Delete the keyspace (I know deletion creates snapshots and moving is unecessary but it's
easier to use sstableloader that way)
- Recreate the keyspace with the schema exported and simplified
- Use sstableloader to import data:
{code}
cd backup; find <ks> -type d -exec sstableloader -d localhost {} \;
{code}

NOTE: Don't think about replaying your commitLogs with your new schema, the column families
won't have the same id.

Any empty cassandra instance startup does at least 1 mutation replay because of the "system"
keyspace. So I still think 0 replayed mutations should never occur and if they do, we should
have some warning with them. And if it's indeed "a CF that doesn't fully exist", it should
be reported at startup.

I hope we can find a way to reproduce it.
                
> Commitlog not replayed after restart - data lost
> ------------------------------------------------
>
>                 Key: CASSANDRA-4481
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4481
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>         Environment: Single node cluster on 64Bit CentOS
>            Reporter: Ivo Mei├čner
>            Priority: Critical
>
> When data is written to the commitlog and I restart the machine, all commited data is
lost that has not been flushed to disk. 
> In the startup logs it says that it replays the commitlog successfully, but the data
is not available then. 
> When I open the commitlog file in an editor I can see the added data, but after the restart
it cannot be fetched from cassandra. 
> {code}
>  INFO 09:59:45,362 Replaying /var/myproject/cassandra/commitlog/CommitLog-83203377067.log
>  INFO 09:59:45,476 Finished reading /var/myproject/cassandra/commitlog/CommitLog-83203377067.log
>  INFO 09:59:45,476 Log replay complete, 0 replayed mutations
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message