cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9742) Nodetool verify
Date Wed, 08 Jul 2015 01:18:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617790#comment-14617790
] 

Jeff Jirsa edited comment on CASSANDRA-9742 at 7/8/15 1:17 AM:
---------------------------------------------------------------

Operator perspective, fwiw: I already have repair schedules. I already know what needs to
be repaired and what doesn't. What I didn't have, previously, was a way to validate the files
on disk actually matched what I believed they matched, short of running scrub.

`verify` was very literally `read only scrub` - when I wrote 5791, I followed the scrub code
path very closely, because that was the use case I was worried about when I wrote it (the
concern was bit level corruption due to failing HDD/RAID controller - scrub would do the job,
but it's a heavy hammer hitting a tiny nail). The notion of "verify this node has all the
data" was already covered by repair, so I never even considered having `verify` do that.

Why not just (add a flag to) enable incremental repair validate checksums for all sstables
- the verifier will {{mutateRepairedAt(sstable.descriptor, ActiveRepairService.UNREPAIRED_SSTABLE)}}
on checksum failure which then allows incremental repair to re-repair that data?
 



was (Author: jjirsa):
Operator perspective, fwiw: I already have repair schedules. I already know what needs to
be repaired and what doesn't. What I didn't have, previously, was a way to validate the files
on disk actually matched what I believed they matched, short of running scrub.

`verify` was very literally `read only scrub` - when I wrote 5791, I followed the scrub code
path very closely, because that was the use case I was worried about when I wrote it (the
concern was bit level corruption due to failing HDD/RAID controller - scrub would do the job,
but it's a heavy hammer hitting a tiny nail). The notion of "verify this node has all the
data" was already covered by repair, so I never even considered having `verify` do that.

Why not just have incremental repair validate checksums for all sstables - the verifier will
{{mutateRepairedAt(sstable.descriptor, ActiveRepairService.UNREPAIRED_SSTABLE)}} on checksum
failure which then allows incremental repair to re-repair that data?
 


> Nodetool verify
> ---------------
>
>                 Key: CASSANDRA-9742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9742
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> We introduced incremental repair in 2.1 but it is difficult to make that the default
without unpleasant surprises for incautious users.
> Additionally, while we now store sstable checksums, we leave verification to the user.
> I propose introducing a new command, {{nodetool verify}}, that would address both of
these.
> Default operation would be to do an incremental repair, plus validate checksums on *all*
sstables (not just unrepaired ones).  We could also have --local mode (checksums only) and
--full (classic repair).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message