cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2811) Repair doesn't stagger flushes
Date Wed, 22 Jun 2011 15:10:49 GMT


Sylvain Lebresne commented on CASSANDRA-2811:

The question that remains is whether we prefer adding a specific mono-threaded executor for
validation compaction (could make sense) or simply introduce a validationCompactionLock.

> Repair doesn't stagger flushes
> ------------------------------
>                 Key: CASSANDRA-2811
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
> When you do a nodetool repair (with no options), the following things occured:
> * For each keyspace, a call to SS.forceTableRepair is issued
> * In each of those calls: for each token range the node is responsible for, a repair
session is created and started
> * Each of these session will request one merkle tree by column family (to each node for
which it makes sense, which includes the node the repair is started on)
> All those merkle tree requests are done basically at the same time. And now that compaction
is multi-threaded, this means that usually more than one validation compaction will be started
at the same time. The problem is that a validation compaction starts by a flush. Given that
by default the flush_queue_size is 4 and the number of compaction thread is the number of
processors and given that on any recent machine the number of core will be >= 4, this means
that this will easily end up blocking write for some period of time.
> It turns out to also have a more subtle problem for repair itself. If two validation
compaction for the same column family (but different range) are started in a very short time
interval, the first validation will block on the flush, but the second one may not block at
all if the memtable is clean when it request it's own flush. In which case that second validation
will be executed on data older than it should.
> I think the simpler fix is to make sure we only ever do one validation compaction at
a time. It's probably a better use of resources anyway. 

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message