Thanks for the reply!
> Not really:
>
> - range scans do not perform read repair
Ok I obviously overlooked that RangeSliceResponseResolver does not repair rows on nodes that
never saw a write for a given key at all. But that's not a big problem for us since we are
mainly interested in fixing missed deletions. And read repairs for conflicting updates seem
to work fine too.
> - if you converted it to range scan + [multi]get, the RR messages are
> fair game to drop to cope with load ("active" repair messages are
> never dropped in 0.6.7+)
We actually have a little hack for that: In the special case of CL_ALL on range slice queries
we perform synchronous mutations as read repairs (instead of read repair messages). This way
we get timeouts for the read when a repair fails. In that case we restart at the given token
and continue from there when the cluster load is lower.
For the time being I guess thats good enough and we hope that 0.7 works a little smoother
when doing repairs.
Cheers,
Daniel
On Mar 7, 2011, at 7:22 PM, Jonathan Ellis wrote:
> On Mon, Mar 7, 2011 at 11:18 AM, Daniel Doubleday
> <daniel.doubleday@gmx.net> wrote:
>> Since we already have a very simple hadoopish framework in place which allows us
to do token range walks with multiple workers and restart at a given position in case of failure
I created a simple worker that would read everything with CL_ALL. With only one worker and
almost no performance impact one scan took 7h.
>>
>> My understanding is that at that point due to read repair I got the same as I would
have achieved with repair runs.
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
|