Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E56469EF7 for ; Fri, 13 Apr 2012 12:25:30 +0000 (UTC) Received: (qmail 50451 invoked by uid 500); 13 Apr 2012 12:25:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50395 invoked by uid 500); 13 Apr 2012 12:25:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50386 invoked by uid 99); 13 Apr 2012 12:25:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 12:25:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arodrime@gmail.com designates 209.85.212.178 as permitted sender) Received: from [209.85.212.178] (HELO mail-wi0-f178.google.com) (209.85.212.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 12:25:21 +0000 Received: by wibhq7 with SMTP id hq7so2314983wib.7 for ; Fri, 13 Apr 2012 05:25:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=kRFhwCnORD6MhtlOMpuwljIx0OS8VIY2AMqsoa6V1Bg=; b=tL3NaM35FI3ltgj2FKWtuH8w+H9kWxFF+UaiimZApuiLsKX/tvtIAW6y/qEnx97TCK 5gHbDw8F1v1FIbTXsuXjtAFTCnHgfx5nY1nnhNUiGaiXbepBukNDYn9vB2to+nsT0PHA MO7OEu0r3KY+4QLVg0e6MeVWs3RlSOP0zUiUJxzQaW4/D03MTmPV9jrcbLeS8jDMuQmj 8ZlTAHJ5OYFW0aLOgAqS7BYDT9r25v5UsPMApYpmXtwECHxwygdu+wcEMIMk5Gxj/ESv hWs6iJ+Ep0iqtmBsiFbwAfN7PJ4a29dSfklUP7zVY1o+PI6N8L/LSNOyYq4643Rc0OfB yGIw== Received: by 10.180.79.72 with SMTP id h8mr4318514wix.1.1334319900609; Fri, 13 Apr 2012 05:25:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.83.2 with HTTP; Fri, 13 Apr 2012 05:24:40 -0700 (PDT) In-Reply-To: <71E6575E-2555-48BA-B0D8-8AEE5E055ECB@thelastpickle.com> References: <71E6575E-2555-48BA-B0D8-8AEE5E055ECB@thelastpickle.com> From: Alain RODRIGUEZ Date: Fri, 13 Apr 2012 14:24:40 +0200 Message-ID: Subject: Re: Trouble with wrong data To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0442827ab14c0a04bd8e8f85 --f46d0442827ab14c0a04bd8e8f85 Content-Type: text/plain; charset=ISO-8859-1 The commitlog_total_space_in_mb was not set, I set it to avoid having the same problem in the future. I am aware of the over-counting problem introduced by the counters. The point is that I use them to make statistics per hours. I can understand having some wrong counts in the column corresponding to the crash time, but how to explain that all my counts since the start (months ago) have become wrong after the crash ? After the crash I tried to repair my entire keyspace from one of the 2 nodes and this made my server crash again, no idea why. Can this failed repair be at the origin of the corrupted data ? I'm still replaying all my counts of the past months and I'm afraid this kind of bug could happen again... I was using cassandra for months without any issue. Alain 2012/4/11 aaron morton > However after recovering from this issue (freeing some space and fixing >> the value of "commitlog_total_space_in_mb" in cassandra.yaml) >> > Did the commit log grow larger than commitlog_total_space_in_mb ? > > I realized that all statistics were all destroyed. I have bad values on >> every single counter since I start using them (september) ! >> > Counter operations are not idempotent. If you client retries a counter > operation it may result in the increment been applied twice. Could this > have been your issue ? > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote: > > By the way, I am using Cassandra 1.0.7, CL = ONE (R/W), RF = 2, 2 EC2 > c1.medium nodes cluster > > Alain > > 2012/4/10 Alain RODRIGUEZ > >> Hi, I'm experimenting a strange and very annoying phenomena. >> >> I had a problem with the commit log size which grew too much and full one >> of the hard disks in all my nodes almost at the same time (2 nodes only, >> RF=2, so the 2 nodes are behaving exactly in the same way) >> >> My data are mounted in an other partition that was not full. However >> after recovering from this issue (freeing some space and fixing the value >> of "commitlog_total_space_in_mb" in cassandra.yaml) I realized that all >> statistics were all destroyed. I have bad values on every single counter >> since I start using them (september) ! >> >> Does anyone experimented something similar or have any clue on this ? >> >> Do you need more information ? >> >> Alain >> > > > --f46d0442827ab14c0a04bd8e8f85 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The commitlog_total_space_in_mb was not set, I set it=A0to avoid having the= same problem in the future.

I am aware of the over-coun= ting problem introduced by the counters. The point is that I use them to ma= ke statistics per hours. I can understand having some wrong counts in the c= olumn=A0corresponding=A0to the crash time, but how to explain that all my c= ounts since the start (months ago) have become wrong after the crash ?

After the crash I tried to repair my entire keyspace fr= om one of the 2 nodes and this made my server crash again, no idea why. Can= this failed repair be at the origin of the corrupted data ?

I'm still replaying all my counts of the past months and= I'm afraid this kind of bug could happen again...

=
I was using cassandra for months without any issue.

Alain

2012/4/11 aaron m= orton <aaron@thelastpickle.com>
However after recovering from this issue (freeing some space and fixin= g the value of =A0"commitlog_total_space_in_mb" in cassandra.yaml= )
Did the commit log grow = larger than commitlog_total_space_in_mb ?=A0

I realized that all statistics were all destroyed. I have bad values o= n every single counter since I start using them (september) !
Counter operations are not idempotent. = If you client retries a counter operation it may result in the increment be= en applied twice. Could this have been your issue ?=A0

Cheers

=A0
<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote:

By the way, I am using Cassandra 1.0.7, CL =3D ONE = (R/W), RF =3D 2, 2 EC2 c1.medium nodes cluster



--f46d0442827ab14c0a04bd8e8f85--