Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50D969CA6 for ; Tue, 8 Nov 2011 10:29:35 +0000 (UTC) Received: (qmail 46154 invoked by uid 500); 8 Nov 2011 10:29:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 46122 invoked by uid 500); 8 Nov 2011 10:29:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 46112 invoked by uid 99); 8 Nov 2011 10:29:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2011 10:29:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of arodrime@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2011 10:29:26 +0000 Received: by eyg24 with SMTP id 24so201803eyg.31 for ; Tue, 08 Nov 2011 02:29:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=4cXV0cw0b4xzrQ+y12ut4KRoGKfeJu2q2UwL030WBNI=; b=k4xAHqIY3gceip6KFeokLXgIn2fnFLTNyWZxxqxwud7Ir0JLPpgcnsoiMJwwm5V1Ci PLtDacoRPy2gnaNP2usKL20cFppg0NDzawa9r3TLdr6cIS+z+qi2v/GEX8QMmXBVKILF jqIcNlUNOKVGiYmDfzyuvRjD/8H48Diupbg/o= Received: by 10.213.28.134 with SMTP id m6mr1341451ebc.119.1320748145295; Tue, 08 Nov 2011 02:29:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.213.35.73 with HTTP; Tue, 8 Nov 2011 02:28:44 -0800 (PST) In-Reply-To: References: From: Alain RODRIGUEZ Date: Tue, 8 Nov 2011 11:28:44 +0100 Message-ID: Subject: Re: Counters and replication factor To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174c45ee09c13004b136a4a8 --0015174c45ee09c13004b136a4a8 Content-Type: text/plain; charset=ISO-8859-1 Sylvain, here is my ticket, but I guess you already know it since you are the assignee :) -->https://issues.apache.org/jira/browse/CASSANDRA-3465 Riyad, Thanks for your help. Alain 2011/11/7 Riyad Kalla > Alain thank you for all the clarification, I understand exactly what you > meant now... and as a result am just as confused as you are :) > > What version of Cassandra are you using? Can you share the important parts > of your config? (you double checked that your replication factor is set on > all 3 to "3"?) > > Also out of curiosity, if you keep querying for up to 5 mins (say every 10 > seconds) do counter1, 2 and 3 still show the same wrong values for getValue > or do the values eventually converge on the correct amounts? > > (I assume 5mins is a long enough window to test, maybe I'm wrong and > another Cassandra dev can correct me here). > > -R > > > On Mon, Nov 7, 2011 at 9:57 AM, Alain RODRIGUEZ wrote: > >> I retried it after restarting all the servers. >> >> I still have wrong results (I simulated an event 5 times and it was >> counted 3 times by some counters 4 or 5 times by others. >> >> What I meant by "but now every request returns me always the same count >> value..." will be easier to explain with an example : >> >> event 1: >> >> counter1.increment >> counter2.increment >> counter3.increment >> >> . >> . >> . >> >> event 5: >> >> counter1.increment >> counter2.increment >> counter3.increment >> >> Show results : >> >> counter1.getValue = returns 4 >> counter2.getValue = returns 3 >> counter3.getValue = returns 5 >> >> counter1.getValue = returns 5 >> counter2.getValue = returns 3 >> counter3.getValue = returns 5 >> >> counter1.getValue = returns 4 >> counter2.getValue = returns 4 >> counter3.getValue = returns 5 >> >> ... >> >> So I've got wrong values, and not always the same ones. In my previous >> email I tried to tell you by saying "but now every request returns me >> always the same count value..." that I had all the time the same wrong >> values, let us say : >> >> counter1.getValue = returns 4 >> counter2.getValue = returns 3 >> counter3.getValue = returns 5 >> >> counter1.getValue = returns 4 >> counter2.getValue = returns 3 >> counter3.getValue = returns 5 >> >> counter1.getValue = returns 4 >> counter2.getValue = returns 3 >> counter3.getValue = returns 5 >> >> But that is not true, I still have some "random" wrong values, maybe >> haven't I query to get counter values often enough to see it last time. >> >> Sorry of not being clearer, that is not easy to explain, neither to >> understand for me. >> >> Thanks for help. >> >> Alain >> >> >> 2011/11/7 Riyad Kalla >> >>> Alain, >>> >>> When you tried CL.All was that only after you had made the change of >>> ReplicationFactor=3 and restarted all the servers? >>> >>> If you hadn't restarted the servers with the new RF, I am not sure that >>> CL.All would have the intended effect. >>> >>> Also, I wasn't sure what you meant by "but know every request returns me >>> always the same count value..." -- didn't want the requests to always >>> return you the same values? >>> >>> Or maybe you are saying that it always returns the same *wrong* value? >>> Like you do: >>> >>> counter.increment (v=1) >>> counter.increment (v=2) >>> counter.increment (v=3) >>> >>> counter.getValue = returns 7 >>> counter.getValue = returns 7 >>> counter.getValue = returns 7 >>> >>> or something inconsistent like that? >>> >>> On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ wrote: >>> >>>> I've tried with CL.All, but it doesn't wotk better. I still have >>>> strange values (between 4 and 10 events counted instead of 10) but know >>>> every request returns me always the same count value... >>>> >>>> It's very strange. >>>> >>>> Any other idea ? >>>> >>>> Alain >>>> >>>> >>>> 2011/11/7 Riyad Kalla >>>> >>>>> Alain, >>>>> >>>>> Try using a CL of 3 or "ALL" and see if that the problem goes away. >>>>> >>>>> Your replication factor (as I just learned) dictates how many nodes >>>>> each piece of data is replicated to; by using a RF of 3 you are saying >>>>> "replicate all my data to all my nodes" (in this case counters). >>>>> >>>>> This doesn't happen immediately, but you can *force* it to happen on >>>>> write by specifying a CL of "ALL". If you specify "1" then your counter >>>>> value is written to one member of the ring, then your command returns. >>>>> >>>>> If you keep querying you will bounce around your ring, reading the >>>>> values from the different nodes until a future date at *which point* all >>>>> the values will likely agree. >>>>> >>>>> If you keep all your code you have now exactly the same, just change >>>>> the code at the end where you read the counter value back, to keep reading >>>>> the counter value back every second for 60 seconds and see if all the >>>>> values eventually match up -- they should (as the counter value is >>>>> replicated to all the nodes and their old values discarded). >>>>> >>>>> -R >>>>> >>>>> >>>>> On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I trying to switch from a RF = 1 to a RF = 3, but I get wrong values >>>>>> from counters when doing so... >>>>>> >>>>>> I got a CF that contains many counters of some events. When I'm at RF >>>>>> = 1 and simulate 10 events, they are well counted. >>>>>> However, when I switch to a RF = 3, my counter show a wrong value >>>>>> that sometimes change when requested twice (it can return 7, then 5 instead >>>>>> of 10 all the time). >>>>>> >>>>>> I first thought that it was a problem of CL because I seem to >>>>>> remember that I read once that I had to use CL.One for reads and writes >>>>>> with counters. So I tried with CL.One, without success... >>>>>> >>>>>> What am I doing wrong ? Is that some precaution to take when >>>>>> replicating counters ? >>>>>> >>>>>> Alain >>>>>> >>>>> >>>>> >>>> >>> >> > --0015174c45ee09c13004b136a4a8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sylvain, here is my ticket, but I guess you already know it since you are t= he assignee :) -->https://issues.apache.org/jira/browse/CASSANDRA-3465
Riya= d, Thanks for your help.

Alain

2011/11/7 Riyad Kall= a <rkalla@gmail.co= m>
Alain thank you for all the clarification, I understand exactly what you me= ant now... and as a result am just as confused as you are :)

=
What version of Cassandra are you using? Can you share the important p= arts of your config? (you double checked that your replication factor is se= t on all 3 to "3"?)

Also out of curiosity, if you keep querying for up to 5= mins (say every 10 seconds) do counter1, 2 and 3 still show the same wrong= values for getValue or do the values eventually converge on the correct am= ounts?

(I assume 5mins is a long enough window to test, maybe = I'm wrong and another Cassandra dev can correct me here).

-R


On Mon, Nov 7, 2011 at= 9:57 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
I retried it after restarting all the s= ervers.

I still have wrong results (I simulated an= event 5 times and it was counted 3 times by some counters 4 or 5 times by = others.

What I meant by "but now every request returns me always the same= count value..." will be easier to explain with an example :

event 1:

counter1.increment=A0
counter2.increment
counter3.increment=A0

.
.
.

event 5:

counter1.increment=A0
counter2.increment
counter3.increment=A0

Show results :
counter1.getValue =3D returns 4
counter2.getValue =3D= returns 3
counter3.getValue =3D returns 5

counter1.getValue =3D returns 5
counter2.getValue =3D returns 3
counter3.getValue =3D return= s 5

counter1.getValue =3D returns 4
coun= ter2.getValue =3D returns 4
counter3.getValue =3D returns 5
=

...

So I've got wrong values, and n= ot always the same ones. In my previous email I tried to tell you by saying= "but now every request returns me always the same count value..."= ; that I had all the time the same wrong values, let us say :

counter1.getValue =3D returns 4
counter2.getV= alue =3D returns 3
counter3.getValue =3D returns 5

=
counter1.getValue =3D returns 4
counter2.getValue =3D = returns 3
counter3.getValue =3D returns 5

counter1.getV= alue =3D returns 4
counter2.getValue =3D returns 3
coun= ter3.getValue =3D returns 5

But that is not true, = I still have some "random" wrong values, maybe haven't I quer= y to get counter values often enough to see it last time.

Sorry of not being clearer, that is not easy to explain= , neither to understand for me.

Thanks for help.

Alain


2011/11/7 Riyad Kalla <rkalla@gmail.com>
Alain,

When you tried CL.All was that only after you had made the c= hange of ReplicationFactor=3D3 and restarted all the servers?
If you hadn't restarted the servers with the new RF, I am n= ot sure that CL.All would have the intended effect.

Also, I wasn't sure what you meant by "but kno= w every request returns me always the same count value..." -- didn'= ;t want the requests to always return you the same values?

Or maybe you are saying that it always returns the same *wrong* = value? Like you do:

counter.increment (v=3D1)
counter.increment (v=3D2)
counter.increment (v=3D3)

counter.getValue =3D returns 7
counter.getValue = =3D returns 7
counter.getValue =3D returns 7

=
or something inconsistent like that?

On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ <arodrime@gmail.com&g= t; wrote:
I've tried with CL.All, but it doesn't wotk better. I still have st= range values (between 4 and 10 events counted instead of 10) but know every= request returns me always the same count value...

It's very strange.

Any other idea ?

Alain


2011/11/7 Riyad Kalla <rkalla@gmail.com>
Alain,

Try using a CL of 3 or "ALL" and see if= that the problem goes away.

Your replication fact= or (as I just learned) dictates how many nodes each piece of data is replic= ated to; by using a RF of 3 you are saying "replicate all my data to a= ll my nodes" (in this case counters).

This doesn't happen immediately, but you can *force= * it to happen on write by specifying a CL of "ALL". If you speci= fy "1" then your counter value is written to one member of the ri= ng, then your command returns.

If you keep querying you will bounce around your ring, = reading the values from the different nodes until a future date at *which p= oint* all the values will likely agree.

If you kee= p all your code you have now exactly the same, just change the code at the = end where you read the counter value back, to keep reading the counter valu= e back every second for 60 seconds and see if all the values eventually mat= ch up -- they should (as the counter value is replicated to all the nodes a= nd their old values discarded).

-R


On Mon, Nov 7, 2011 at 8:= 15 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
Hi,

I trying to switch from a RF =3D 1 to a R= F =3D 3, but I get wrong values from counters when doing so...
I got a CF that contains many counters of some events. When I&= #39;m at RF =3D 1 and simulate 10 events, they are well counted.
However, when I switch to a RF =3D 3, my counter show a wrong value th= at sometimes change when requested twice (it can return 7, then 5 instead o= f 10 all the time).

I first thought that it was a = problem of CL because I seem to remember that I read once that I had to use= CL.One for reads and writes with counters. So I tried with CL.One, without= success...

What am I doing wrong ? Is that some precaution to take= when replicating counters ?

Alain






--0015174c45ee09c13004b136a4a8--