Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01FCCDB8C for ; Thu, 1 Nov 2012 11:53:56 +0000 (UTC) Received: (qmail 72622 invoked by uid 500); 1 Nov 2012 11:53:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72575 invoked by uid 500); 1 Nov 2012 11:53:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72559 invoked by uid 99); 1 Nov 2012 11:53:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 11:53:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of arodrime@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 11:53:49 +0000 Received: by mail-vc0-f172.google.com with SMTP id fl11so2859670vcb.31 for ; Thu, 01 Nov 2012 04:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=oJ6YChP/8GSdgRt470R4AZmE6kICR6M9W7XoBvpzNTQ=; b=UE1GpXGWR4EwhKPQfEuPmngutvct2SxUOAqd1qDUFvo6DBz4J+x5/nNuLDgu0ptGGh Utlb/PGPKVNrmZkWsHyDAGCMk5o8YrQCBEtxOuhZfUqTTApIT38r8/5f6efm2LCP2iqF 2z7Q5kyA/0qZU9nlWj39kqBEkZdFjr6QvZwQW0VQ2vl5gLR8lA1zG50qAm0KZg68Z78j 49XW6utfo9HuV8v6P2Qf+rza5RQGQ08tmGoEGAmcEVzF+Pph/RwbUna/xGiKgVcYI7Ud +dXmg2X9OhIHC9T3+/YSZHw1xixdk0fAmZ1fyyAISQaIlMMsjKZXIo2QPODzRI5WtK+D hpXg== Received: by 10.52.66.10 with SMTP id b10mr50523446vdt.71.1351770807931; Thu, 01 Nov 2012 04:53:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.249.3 with HTTP; Thu, 1 Nov 2012 04:53:07 -0700 (PDT) In-Reply-To: References: <7AF531E1-50B2-403D-848F-BE4241411E45@thelastpickle.com> From: Alain RODRIGUEZ Date: Thu, 1 Nov 2012 12:53:07 +0100 Message-ID: Subject: Re: Multiple counters value after restart To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3071cf1ed313de04cd6daa7d X-Virus-Checked: Checked by ClamAV on apache.org --20cf3071cf1ed313de04cd6daa7d Content-Type: text/plain; charset=ISO-8859-1 "Can you try it thought, or run a repair ?" Repairing didn't help "My first thought is to use QUOURM" This fix the problem. However, my data is probably still inconsistent, even if I read now always the same value. The point is that I can't handle a crash with CL.QUORUM, I can't even restart a node... I will add a third server. "But isn't Cassandra suppose to handle a server crash ? When a server crashes I guess it don't drain before..." "I was asking to understand how you did the upgrade." Ok. On my side I am just concern about the possibility of using counters with CL.ONE and correctly handle a crash or restart without a drain. Alain 2012/11/1 aaron morton > "What CL are you using ?" > > I think this can be what causes the issue. I'm writing and reading at CL > ONE. I didn't drain before stopping Cassandra and this may have produce a > fail in the current counters (those which were being written when I stopped > a server). > > My first thought is to use QUOURM. But with only two nodes it's hard to > get strong consistency using QUOURM. > Can you try it thought, or run a repair ? > > But isn't Cassandra suppose to handle a server crash ? When a server > crashes I guess it don't drain before... > > I was asking to understand how you did the upgrade. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 1/11/2012, at 11:39 AM, Alain RODRIGUEZ wrote: > > "What version of cassandra are you using ?" > > 1.1.2 > > "Can you explain this further?" > > I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one > server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by > rebooting the server. This server is dedicated to cassandra. I can't tell > you more about it 'cause I don't get it... But a simple Cassandra restart > wasn't enough. > > "Was something writing to the cluster ?" > > Yes we are having some activity and perform about 600 w/s. > > "Did you drain for the upgrade ?" > > We upgrade a long time ago and to 1.1.2. This warning is about the version > 1.1.6. > > "What changes did you make ?" > > In the cassandra.yaml I just change the "compaction_throughput_mb_per_sec" > property to slow down my compaction a bit. I don't think the problem come > from here. > > "Are you saying that a particular counter column is giving different > values for different reads ?" > > Yes, this is exactly what I was saying. Sorry if something is wrong with > my English, it's not my mother tongue. > > "What CL are you using ?" > > I think this can be what causes the issue. I'm writing and reading at CL > ONE. I didn't drain before stopping Cassandra and this may have produce a > fail in the current counters (those which were being written when I stopped > a server). > > But isn't Cassandra suppose to handle a server crash ? When a server > crashes I guess it don't drain before... > > Thank you for your time Aaron, once again. > > Alain > > > > 2012/10/31 aaron morton > >> What version of cassandra are you using ? >> >> I finally restart Cassandra. It didn't solve the problem so I stopped >>> Cassandra again on that node and restart my ec2 server. This solved the >>> issue (1800 r/s to 100 r/s). >> >> Can you explain this further? >> Was something writing to the cluster ? >> Did you drain for the upgrade ? >> https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt#L17 >> >> Today I changed my cassandra.yml and restart this same server to apply my >>> conf. >> >> What changes did you make ? >> >> I just noticed that my homepage (which uses a Cassandra counter and >>> refreshes every sec) shows me 4 different values. 2 of them repeatedly >>> (5000 and 4000) and the 2 other some rare times (5500 and 3800) >> >> Are you saying that a particular counter column is giving different >> values for different reads ? >> What CL are you using ? >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 31/10/2012, at 3:39 AM, Jason Wee wrote: >> >> maybe enable the debug in log4j-server.properties and going through the >> log to see what actually happen? >> >> On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRIGUEZ wrote: >> >>> Hi, >>> >>> I have an issue with counters, yesterday I had a lot of ununderstandable >>> reads/sec on one server. I finally restart Cassandra. It didn't solve the >>> problem so I stopped Cassandra again on that node and restart my ec2 >>> server. This solved the issue (1800 r/s to 100 r/s). >>> >>> Today I changed my cassandra.yml and restart this same server to apply >>> my conf. >>> >>> I just noticed that my homepage (which uses a Cassandra counter and >>> refreshes every sec) shows me 4 different values. 2 of them repeatedly >>> (5000 and 4000) and the 2 other some rare times (5500 and 3800) >>> >>> Only the counters made today and yesterday are concerned. >>> >>> I performed a repair without success. These data are the heart of our >>> business so if someone had any clue on it, I would be really grateful... >>> >>> The sooner the better, I am in production with these random counters. >>> >>> Alain >>> >>> INFO: >>> >>> My environnement is 2 nodes (EC2 large), RF 2, CL.ONE (R & W), Random >>> Partitioner. >>> >>> xxx.xxx.xxx.241 eu-west 1b Up Normal 151.95 GB >>> 50.00% 0 >>> xxx.xxx.xxx.109 eu-west 1b Up Normal 117.71 GB >>> 50.00% 85070591730234615865843651857942052864 >>> >>> Here is my conf: http://pastebin.com/5cMuBKDt >>> >>> >>> >> >> > > --20cf3071cf1ed313de04cd6daa7d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable "Can you t= ry it thought, or run a repair ?"

Repairing didn't help

&qu= ot;My first thought is to use QUOURM"

This fix the problem. However, my data is probably still inconsis= tent, even if I read now always the same value. The point is that I can'= ;t handle a crash with CL.QUORUM, I can't even restart a node...=

I w= ill add a third server.

"But isn't Cassandra s= uppose to handle a server crash ? When a server crashes I guess it don'= t drain before..."
"I was asking to understand how you did the upgrade."=

Ok. On my side = I am just concern about the possibility of using counters with CL.ONE and c= orrectly handle a crash or restart without a drain.

Alain

<= /div>


2012/11/1 aar= on morton <aaron@thelastpickle.com>
"What CL are you using ?"

I think this can be what causes the issue. I'm writing and re= ading at CL ONE. I didn't drain before stopping Cassandra and this may = have produce a fail in the current counters (those which were being written= when I stopped a server).
My first thought is to use QUOURM. But with only two nodes it&= #39;s hard to get strong consistency using =A0QUOURM. =A0
Can you try= it thought, or run a repair ?=A0

But isn't Cassandra suppose to handle a serv= er crash ? When a server crashes I guess it don't drain before...
I was asking to understand how you did the upgrade.=A0<= /div>

Cheers

<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 1/11/2012, at 11:39 AM, Alain= RODRIGUEZ <arod= rime@gmail.com> wrote:

"What version of cas= sandra are you using ?"

1.1.2

"Can you explain this further?"

I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one= server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by reb= ooting the server. This server is dedicated to cassandra. I can't tell = you more about it 'cause I don't get it... But a simple Cassandra r= estart wasn't enough.

&qu= ot;Was s= omething writing to the cluster ?"

Yes= we are having some activity and perform about 600 w/s.

"Did you drain for the upgrade ?"

We upgrade a long time ago and to 1.1.2. This warning is about the versi= on 1.1.6.

"What changes did you make ?"

In the cassandra.yaml I just change the "compaction_throughp= ut_mb_per_sec" property to slow down my compaction a bit. I don't = think the problem come from here.

"Are you saying that a particular counter column is giving differ= ent values for different reads ?"

Yes, this is exactly what I was saying. Sorry if something is wrong with= my English, it's not my mother tongue.

"What CL are you using ?"

I think this can be what causes the issue. I'm writing and reading a= t CL ONE. I didn't drain before stopping Cassandra and this may have pr= oduce a fail in the current counters (those which were being written when I= stopped a server).

But= isn't Cassandra suppose to handle a server crash ? When a server crash= es I guess it don't drain before...

Tha= nk you for your time Aaron, once again.

Alain



2012/10/31 aaron morton <aaron@thelastpickle.com>
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;pa= dding-left:1ex">
What version of cassandra are you using= ?

=A0I finally restart Cassandra. It didn't solve the problem so I stoppe= d Cassandra again on that node and restart my ec2 server. This solved the i= ssue (1800 r/s to 100 r/s).
Can you ex= plain this further?
Was something writing to the cluster ?

Today I changed my cassandra.yml and restart this same server to apply my c= onf.
What changes did you make ?<= /div>

I just noticed that my homepage (which uses a Cassandra co= unter and refreshes every sec) shows me 4 different values. 2 of them repea= tedly (5000 and 4000) and the 2 other some rare times (5500 and 3800)
Are you saying that a particular counter column is= giving different values for different reads ?=A0
What CL are you= using ?

Cheers

-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 31/10/2012, at 3:39 AM, Jason Wee <peichieh@gmail.com> wrote:
=
maybe enable the debug in log4j-server.proper= ties and going through the log to see what actually happen?

On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRI= GUEZ <arodrime@gmail.com> wrote:
Hi,=A0

I have an issue with = counters, yesterday I had a lot of ununderstandable reads/sec on one server= . I finally restart Cassandra. It didn't solve the problem so I stopped= Cassandra again on that node and restart my ec2 server. This solved the is= sue (1800 r/s to 100 r/s).

Today I changed my cassandra.yml and restart this same = server to apply my conf.

I just noticed that my ho= mepage (which uses a Cassandra counter and refreshes every sec) shows me 4 = different values. 2 of them repeatedly (5000 and 4000) and the 2 other some= rare times (5500 and 3800)

Only the counters made today and yesterday are concerne= d.

I performed a repair without success. These dat= a are the heart of our business so if someone had any clue on it, I would b= e really grateful...

The sooner the better, I am in production with these ra= ndom counters.

Alain

INFO= :

My environnement is 2 nodes (EC2 large), RF= 2, CL.ONE (R & W), Random Partitioner.

xxx.xxx.xxx.241 =A0 =A0eu-west =A0 =A0 1b =A0 =A0 = =A0 =A0 =A0Up =A0 =A0 Normal =A0151.95 GB =A0 =A0 =A0 50.00% =A0 =A0 =A0 = =A0 =A0 =A0 =A00
xxx.xxx.xxx.109 =A0 =A0eu-west =A0 =A0 1b =A0 = =A0 =A0 =A0 =A0Up =A0 =A0 Normal =A0117.71 GB =A0 =A0 =A0 50.00% =A0 =A0 = =A0 =A0 =A0 =A0 =A085070591730234615865843651857942052864

Here is my conf:=A0http://pastebin.com/5cMuBKDt







--20cf3071cf1ed313de04cd6daa7d--