cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing
Date Fri, 12 Jun 2015 18:13:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

mck updated CASSANDRA-9591:
---------------------------
    Description: 
Today SSTableReader needs at minimum 3 files to load an sstable:
 - -Data.db
 - -CompressionInfo.db 
 - -Index.db

But during the scrub process the -Index.db file isn't actually necessary, unless there's corruption
in the -Data.db and we want to be able to skip over corrupted rows. Given that there is still
a fair chance that there's nothing wrong with the -Data.db file and we're just missing the
-Index.db file this patch addresses that situation.

So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to recover
sstables despite missing -Index.db files.


This can happen from a catastrophic incident where data directories have been lost and/or
corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas
or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild.

I have not tested this patch against normal c* operations and all the other (more critical)
ways SSTableReader is used. i'll happily do that and add the needed units tests if people
see merit in accepting the patch.

Otherwise the patch can live with the issue, in-case anyone else needs it. There's also a
cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
to make life a little easier for anyone finding themselves in such a bad situation.

  was:
Today SSTableReader needs at minimum 3 files to load an sstable:
 - -Data.db
 - -CompressionInfo.db 
 - -Index.db

But during the scrub process the -Index.db file isn't actually necessary, unless there's corruption
in the -Data.db and we want to be able to skip over corrupted rows. Given that there is still
a fair chance that there's nothing wrong with the -Data.db file and we're just missing the
-Index.db file this patch addresses that situation.

So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to recover
sstables despite missing -Index.db files.


This can happen from a catastrophic incident where data directories have been lost and/or
corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas
or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild.

I have not tested this patch against normal c* operations and all the other (more critical)
ways SSTableReader is used. i'll happily do that and add the needed units tests if people
see merit in accepting the patch.

Otherwise the patch can live with the issue, in-case anyone else needs it. I've uploaded a
cassandra distribution bundled with the patch as well to make life a little easier for anyone
finding themselves in such a bad situation.


> Scrub (recover) sstables even when -Index.db is missing
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: mck
>            Assignee: mck
>              Labels: sstablescrub
>             Fix For: 2.0.15
>
>         Attachments: 9591-2.0.txt
>
>
> Today SSTableReader needs at minimum 3 files to load an sstable:
>  - -Data.db
>  - -CompressionInfo.db 
>  - -Index.db
> But during the scrub process the -Index.db file isn't actually necessary, unless there's
corruption in the -Data.db and we want to be able to skip over corrupted rows. Given that
there is still a fair chance that there's nothing wrong with the -Data.db file and we're just
missing the -Index.db file this patch addresses that situation.
> So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to
recover sstables despite missing -Index.db files.
> This can happen from a catastrophic incident where data directories have been lost and/or
corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas
or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild.
> I have not tested this patch against normal c* operations and all the other (more critical)
ways SSTableReader is used. i'll happily do that and add the needed units tests if people
see merit in accepting the patch.
> Otherwise the patch can live with the issue, in-case anyone else needs it. There's also
a cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
to make life a little easier for anyone finding themselves in such a bad situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message