incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riyad Kalla <rka...@gmail.com>
Subject Re: Counters and replication factor
Date Tue, 08 Nov 2011 16:24:12 GMT
Most welcome, hopefully the bug is easy to find and kill :)

On Tue, Nov 8, 2011 at 3:28 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Sylvain, here is my ticket, but I guess you already know it since you are
> the assignee :) -->https://issues.apache.org/jira/browse/CASSANDRA-3465
> Riyad, Thanks for your help.
>
> Alain
>
> 2011/11/7 Riyad Kalla <rkalla@gmail.com>
>
>> Alain thank you for all the clarification, I understand exactly what you
>> meant now... and as a result am just as confused as you are :)
>>
>> What version of Cassandra are you using? Can you share the important
>> parts of your config? (you double checked that your replication factor is
>> set on all 3 to "3"?)
>>
>> Also out of curiosity, if you keep querying for up to 5 mins (say every
>> 10 seconds) do counter1, 2 and 3 still show the same wrong values for
>> getValue or do the values eventually converge on the correct amounts?
>>
>> (I assume 5mins is a long enough window to test, maybe I'm wrong and
>> another Cassandra dev can correct me here).
>>
>> -R
>>
>>
>> On Mon, Nov 7, 2011 at 9:57 AM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>>
>>> I retried it after restarting all the servers.
>>>
>>> I still have wrong results (I simulated an event 5 times and it was
>>> counted 3 times by some counters 4 or 5 times by others.
>>>
>>> What I meant by "but now every request returns me always the same count
>>> value..." will be easier to explain with an example :
>>>
>>> event 1:
>>>
>>> counter1.increment
>>> counter2.increment
>>> counter3.increment
>>>
>>> .
>>> .
>>> .
>>>
>>> event 5:
>>>
>>> counter1.increment
>>> counter2.increment
>>> counter3.increment
>>>
>>> Show results :
>>>
>>> counter1.getValue = returns 4
>>> counter2.getValue = returns 3
>>> counter3.getValue = returns 5
>>>
>>> counter1.getValue = returns 5
>>> counter2.getValue = returns 3
>>> counter3.getValue = returns 5
>>>
>>> counter1.getValue = returns 4
>>> counter2.getValue = returns 4
>>> counter3.getValue = returns 5
>>>
>>> ...
>>>
>>> So I've got wrong values, and not always the same ones. In my previous
>>> email I tried to tell you by saying "but now every request returns me
>>> always the same count value..." that I had all the time the same wrong
>>> values, let us say :
>>>
>>> counter1.getValue = returns 4
>>> counter2.getValue = returns 3
>>> counter3.getValue = returns 5
>>>
>>> counter1.getValue = returns 4
>>> counter2.getValue = returns 3
>>> counter3.getValue = returns 5
>>>
>>> counter1.getValue = returns 4
>>> counter2.getValue = returns 3
>>> counter3.getValue = returns 5
>>>
>>> But that is not true, I still have some "random" wrong values, maybe
>>> haven't I query to get counter values often enough to see it last time.
>>>
>>> Sorry of not being clearer, that is not easy to explain, neither to
>>> understand for me.
>>>
>>> Thanks for help.
>>>
>>> Alain
>>>
>>>
>>> 2011/11/7 Riyad Kalla <rkalla@gmail.com>
>>>
>>>> Alain,
>>>>
>>>> When you tried CL.All was that only after you had made the change of
>>>> ReplicationFactor=3 and restarted all the servers?
>>>>
>>>> If you hadn't restarted the servers with the new RF, I am not sure that
>>>> CL.All would have the intended effect.
>>>>
>>>> Also, I wasn't sure what you meant by "but know every request returns
>>>> me always the same count value..." -- didn't want the requests to always
>>>> return you the same values?
>>>>
>>>> Or maybe you are saying that it always returns the same *wrong* value?
>>>> Like you do:
>>>>
>>>> counter.increment (v=1)
>>>> counter.increment (v=2)
>>>> counter.increment (v=3)
>>>>
>>>> counter.getValue = returns 7
>>>> counter.getValue = returns 7
>>>> counter.getValue = returns 7
>>>>
>>>> or something inconsistent like that?
>>>>
>>>> On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>>>>
>>>>> I've tried with CL.All, but it doesn't wotk better. I still have
>>>>> strange values (between 4 and 10 events counted instead of 10) but know
>>>>> every request returns me always the same count value...
>>>>>
>>>>> It's very strange.
>>>>>
>>>>> Any other idea ?
>>>>>
>>>>> Alain
>>>>>
>>>>>
>>>>> 2011/11/7 Riyad Kalla <rkalla@gmail.com>
>>>>>
>>>>>> Alain,
>>>>>>
>>>>>> Try using a CL of 3 or "ALL" and see if that the problem goes away.
>>>>>>
>>>>>> Your replication factor (as I just learned) dictates how many nodes
>>>>>> each piece of data is replicated to; by using a RF of 3 you are saying
>>>>>> "replicate all my data to all my nodes" (in this case counters).
>>>>>>
>>>>>> This doesn't happen immediately, but you can *force* it to happen
on
>>>>>> write by specifying a CL of "ALL". If you specify "1" then your counter
>>>>>> value is written to one member of the ring, then your command returns.
>>>>>>
>>>>>> If you keep querying you will bounce around your ring, reading the
>>>>>> values from the different nodes until a future date at *which point*
all
>>>>>> the values will likely agree.
>>>>>>
>>>>>> If you keep all your code you have now exactly the same, just change
>>>>>> the code at the end where you read the counter value back, to keep
reading
>>>>>> the counter value back every second for 60 seconds and see if all
the
>>>>>> values eventually match up -- they should (as the counter value is
>>>>>> replicated to all the nodes and their old values discarded).
>>>>>>
>>>>>> -R
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I trying to switch from a RF = 1 to a RF = 3, but I get wrong
values
>>>>>>> from counters when doing so...
>>>>>>>
>>>>>>> I got a CF that contains many counters of some events. When I'm
at
>>>>>>> RF = 1 and simulate 10 events, they are well counted.
>>>>>>> However, when I switch to a RF = 3, my counter show a wrong value
>>>>>>> that sometimes change when requested twice (it can return 7,
then 5 instead
>>>>>>> of 10 all the time).
>>>>>>>
>>>>>>> I first thought that it was a problem of CL because I seem to
>>>>>>> remember that I read once that I had to use CL.One for reads
and writes
>>>>>>> with counters. So I tried with CL.One, without success...
>>>>>>>
>>>>>>> What am I doing wrong ? Is that some precaution to take when
>>>>>>> replicating counters ?
>>>>>>>
>>>>>>> Alain
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message