cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5419) Employ column differencing (as done for read repairs) during node repairs
Date Tue, 02 Apr 2013 19:53:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620191#comment-13620191
] 

Brandon Williams commented on CASSANDRA-5419:
---------------------------------------------

I think this is doable, but it will definitely need to be on subranges only since we'll be
dealing with quite a significant amount of bloat in the heap if we maintain merkle trees for
each row, and probably require a flag to specify it since you really only want to do this
on CFs with wide rows.  That said, the good news is a 'lean' tree of 2**16 on the rows is
plenty good enough, since a row with 10M columns would still only transfer 153 columns if
one was damaged.
                
> Employ column differencing (as done for read repairs) during node repairs 
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5419
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5419
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.10
>         Environment: Production
>            Reporter: Ahmed Bashir
>              Labels: compaction, repair
>
> In particular for wide rows, the headroom required for node repairs can be substantial
given that entire rows are streamed for any and all row hash discrepancies.
> This headroom must be sustained until compaction slowly compacts these newly streamed
SSTables and reduces the overall load on each instance.
> The overall footprint of node repairs would be greatly reduced if we employed differencing
at the column level and sent over row mutations, similar to what is done during read repair.
 This is a great alternative for deployments wherein sending over entire rows rather than
the deltas is not an option.  
> Since node repairs can now specify start and end tokens (i.e. subrange repairs), the
additional computation can be broken down easily, and it's a welcome trade-off for significantly
less streaming, compaction, and temporary headroom requirements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message