cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing
Date Tue, 23 Jun 2015 21:21:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598381#comment-14598381
] 

mck edited comment on CASSANDRA-9591 at 6/23/15 9:21 PM:
---------------------------------------------------------

bq. If it's impossible to have those values wired up…

It's possible to "wire" these values up doing a poorman's approach of doing a complete pass
through the data file. That's pretty wasteful, but we're only talking about the edge-case
of the StandaloneScrubber here.

eg
{code}
private void findFirstAndLast() throws IOException
{
    // we have no primary index. take a pass through the data file to assign first and last.
costly, but only for StandaloneScrubber
    try (RandomAccessReader dataFile = openDataReader())
    {
        while (!dataFile.isEOF())
        {
            DecoratedKey decoratedKey = partitioner.decorateKey(ByteBufferUtil.readWithShortLength(dataFile));
            if (first == null)
                 first = decoratedKey;
            last = decoratedKey;

            SSTableIdentityIterator atoms = new SSTableIdentityIterator(this, dataFile, decoratedKey,
false);
            while (atoms.hasNext())
                atoms.next();
        }
    }

    first = getMinimalKey(first);
    last = getMinimalKey(last);
}
{code}

Would you rather see the flag into {{updateLiveSet()}}?


was (Author: michaelsembwever):
bq. If it's impossible to have those values wired up…

It's possible to "wire" these values up doing a poorman's approach of doing a complete pass
through the data file. That's pretty wasteful, but we're only taking about the edge-case of
the StandaloneScrubber here.

eg
{code}
private void findFirstAndLast() throws IOException
{
    // we have no primary index. take a pass through the data file to assign first and last.
costly, but only for StandaloneScrubber
    try (RandomAccessReader dataFile = openDataReader())
    {
        while (!dataFile.isEOF())
        {
            DecoratedKey decoratedKey = partitioner.decorateKey(ByteBufferUtil.readWithShortLength(dataFile));
            if (first == null)
                 first = decoratedKey;
            last = decoratedKey;

            SSTableIdentityIterator atoms = new SSTableIdentityIterator(this, dataFile, decoratedKey,
false);
            while (atoms.hasNext())
                atoms.next();
        }
    }

    first = getMinimalKey(first);
    last = getMinimalKey(last);
}
{code}

Would you rather see the flag into {{updateLiveSet()}}?

> Scrub (recover) sstables even when -Index.db is missing
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: mck
>            Assignee: mck
>              Labels: sstablescrub
>             Fix For: 2.0.x
>
>         Attachments: 9591-2.0.txt, 9591-2.1.txt
>
>
> Today SSTableReader needs at minimum 3 files to load an sstable:
>  - -Data.db
>  - -CompressionInfo.db 
>  - -Index.db
> But during the scrub process the -Index.db file isn't actually necessary, unless there's
corruption in the -Data.db and we want to be able to skip over corrupted rows. Given that
there is still a fair chance that there's nothing wrong with the -Data.db file and we're just
missing the -Index.db file this patch addresses that situation.
> So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to
recover sstables despite missing -Index.db files.
> This can happen from a catastrophic incident where data directories have been lost and/or
corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas
or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild.
> I have not tested this patch against normal c* operations and all the other (more critical)
ways SSTableReader is used. i'll happily do that and add the needed units tests if people
see merit in accepting the patch.
> Otherwise the patch can live with the issue, in-case anyone else needs it. There's also
a cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
to make life a little easier for anyone finding themselves in such a bad situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message