incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: read repairs for get_columns_since
Date Wed, 22 Apr 2009 15:46:32 GMT
Here is another reason to have a repair flag:

Currently read repair is all or nothing.  (There is an undocumented
DoConsistencyChecksBoolean that turns it on or off globally.  The
default is true.)   Assuming that no RR at all is unacceptable, having
it on all the time causes inefficiency when there are more reads than
writes, since RR sends a read command to each replica.

If you're doing a weak read (the only kind exposed in thrift right
now) where writes are relatively rare then it would allow more
requests per node to only do a RR every M requests.  Then you would be
closer to one read op per request instead of N reads where N is the
number of replicas.

-Jonathan

On Tue, Apr 21, 2009 at 4:28 PM, Jun Rao <junrao@almaden.ibm.com> wrote:
>
> The difference is that for get_slice/get_columns_since, their results are
> affected by every update. For get_column, the result only changes when the
> asked for column is updated.
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>
> junrao@almaden.ibm.com
>
>
> Jonathan Ellis <jbellis@gmail.com> wrote on 04/20/2009 01:35:34 PM:
>
>>
>> It seems to me that you could come up with workloads that would cause
>> similar behavior for get_slice as well, or even get_column.  No?
>>
>> In my mind it would be reasonable to add a read_repair boolean flag to
>> all the read API calls.  (But I'm not volunteering to implement that,
>> since I don't think we're going to need it in the near future. :)
>>
>> -Jonathan
>>
>> On Mon, Apr 20, 2009 at 1:54 PM, Jun Rao <junrao@almaden.ibm.com> wrote:
>> >
>> > I am wondering is we should really be doing read repairs for
>> > get_columns_since. If there is continuous ongoing updates, it's very
> likely
>> > that two consecutive get_columns_since calls will never return the same
>> > result, even when there is no real data loss. It seems that read repair
> for
>> > this function could add a lot of unnecessary overheads.
>> >
>> > Jun
>> > IBM Almaden Research Center
>> > K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>> >
>> > junrao@almaden.ibm.com

Mime
View raw message