Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6979B6E3A for ; Thu, 21 Jul 2011 03:15:19 +0000 (UTC) Received: (qmail 86699 invoked by uid 500); 21 Jul 2011 03:15:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86407 invoked by uid 500); 21 Jul 2011 03:15:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86353 invoked by uid 99); 21 Jul 2011 03:15:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 03:15:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of springrider@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 03:14:56 +0000 Received: by vxi40 with SMTP id 40so755537vxi.31 for ; Wed, 20 Jul 2011 20:14:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=uIOR43F3BVYVTgErN6zZTC/tMCEZ4FAQwx7PrTqTays=; b=EqtmqSOLUq6LZycPvnW1zQkF2Yd3vTBgoUtnqvCxVsfHndtvgSghxMaYM4zmATOqH4 CoNjErEfMZbGemsubQEJ7kVdbdZUx7xwTIbdEnYJF0U9lkB2cfvrUEdnrpH+DqQSqaOY sQoZMDhRIUKgnt+zC1haiBLFTkhyZf1LL58Ko= Received: by 10.52.25.112 with SMTP id b16mr6770949vdg.254.1311218063093; Wed, 20 Jul 2011 20:14:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.156.228 with HTTP; Wed, 20 Jul 2011 20:14:03 -0700 (PDT) In-Reply-To: References: From: Yan Chunlu Date: Thu, 21 Jul 2011 11:14:03 +0800 Message-ID: Subject: Re: with proof Re: cassandra goes infinite loop and data lost..... To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf307cfbf8dfaec004a88bbe9c --20cf307cfbf8dfaec004a88bbe9c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable sorry for the misunderstanding. I saw many N of 2147483647 which N=3D0 and thought it was not doing anything. my node was very unbalanced and I was intend to rebalance it by "nodetool move" after a "node repair", does that cause the slices much large? Address Status State Load Owns Token 84944475733633104818662955375549269696 10.28.53.2 Down Normal 71.41 GB 81.09% 52773518586096316348543097376923124102 10.28.53.3 Up Normal 14.72 GB 10.48% 70597222385644499881390884416714081360 10.28.53.4 Up Normal 13.5 GB 8.43% 84944475733633104818662955375549269696 should I do "nodetool move" according to http://wiki.apache.org/cassandra/Operations#Load_balancing before doing repair? thank you for your help! On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis wrote: > This is not an infinite loop, you can see the column objects being > iterated over are different. > > Like I said last time, "I do see that it's saying "N of 2147483647" > which looks like you're > doing slices with a much larger limit than is advisable." > > On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu wrote= : > > this time it is another node, the node goes down during repair, and com= e > > back but never up, I change log level to "DEBUG" and found out it print > out > > the following message infinitely > > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243 > > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857 > > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545 > > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767 > > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564 > > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900 > > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402 > > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118 > > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170 > > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123) > > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918 > > > > > > > > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis > wrote: > >> > >> That says "I'm collecting data to answer requests." > >> > >> I don't see anything here that indicates an infinite loop. > >> > >> I do see that it's saying "N of 2147483647" which looks like you're > >> doing slices with a much larger limit than is advisable (good way to > >> OOM the way you already did). > >> > >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu > wrote: > >> > I gave cassandra 8GB heap size and somehow it run out of memory and > >> > crashed. > >> > after I start it, it just runs in to the following infinite loop, th= e > >> > last > >> > line: > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434 > >> > goes for ever > >> > I have 3 nodes and RF=3D2, so I am losing data. is that means I am > screwed > >> > and > >> > can't get it back? > >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123= ) > >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943 > >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123= ) > >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 1 of 2147483647: auje:false:13@1305641597957075 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612 > >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123= ) > >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715 > >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123= ) > >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339 > >> > > >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu > >> > wrote: > >> >> > >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 12= 3) > >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434 > >> > > >> > > >> > -- > >> > =E9=97=AB=E6=98=A5=E8=B7=AF > >> > > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of DataStax, the source for professional Cassandra support > >> http://www.datastax.com > > > > > > > > -- > > =E9=97=AB=E6=98=A5=E8=B7=AF > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --=20 =E9=97=AB=E6=98=A5=E8=B7=AF --20cf307cfbf8dfaec004a88bbe9c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable sorry for the misunderstanding. =C2=A0I saw many N of 2147483647 which N=3D0 a= nd thought it was not doing anything.

my node was very u= nbalanced and I was intend to rebalance it by "nodetool move" aft= er a "node repair", does that cause the slices much large?

Address =C2=A0 =C2=A0 =C2=A0 =C2=A0 Status State = =C2=A0 Load =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Owns =C2=A0 =C2=A0Toke= n =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0849444757336331048186629553= 75549269696 =C2=A0 =C2=A0 =C2=A0
10.28.53.2 =C2=A0 =C2=A0 =C2=A0Down =C2=A0 Normal =C2=A071.41 GB =C2= =A0 =C2=A0 =C2=A0 =C2=A081.09% =C2=A052773518586096316348543097376923124102= =C2=A0 =C2=A0 =C2=A0
10.28.53.3 =C2=A0 =C2=A0 Up =C2=A0 =C2=A0 N= ormal =C2=A014.72 GB =C2=A0 =C2=A0 =C2=A0 =C2=A010.48% =C2=A070597222385644= 499881390884416714081360 =C2=A0 =C2=A0 =C2=A0
10.28.53.4 =C2=A0 =C2=A0 =C2=A0Up =C2=A0 =C2=A0 Normal =C2=A013.5 GB =C2=A0= =C2=A0 =C2=A0 =C2=A0 8.43% =C2=A0 84944475733633104818662955375549269696 = =C2=A0


should I do "nodetool m= ove" according to http://wiki.apache.org/cassandra/Operations#Load_balanci= ng =C2=A0before doing repair?

thank you for your help!


<= br>
On Thu, Jul 21, 2011 at 10:47 AM, Jonathan El= lis <jbellis@gmail.com> wrote:
This is not an infinite loop, you can see th= e column objects being
iterated over are different.

Like I said last time, "I do see that it's saying "N of 2147483647"
which looks like you're
doing slices with a much larger limit than is advisable."

On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <
springrider@gmail.com> wrote:
> this time it is another node, the node goes down during repair, and co= me
> back but never up, I change log level to "DEBUG" and found o= ut it print out
> the following message infinitely
> DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)<= br> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>
>
>
> On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> That says "I'm collecting data to answer requests."<= br> >>
>> I don't see anything here that indicates an infinite loop.
>>
>> I do see that it's saying "N of 2147483647" which looks= like you're
>> doing slices with a much larger limit than is advisable (good way = to
>> OOM the way you already did).
>>
>> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <springrider@gmail.com> wrot= e:
>> > I gave cassandra 8GB heap size and somehow it run out of memo= ry and
>> > crashed.
>> > after I start it, it just runs in to the following infinite l= oop, the
>> > last
>> > line:
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > goes for ever
>> > I have 3 nodes and RF=3D2, so I am losing data. is that means= I am screwed
>> > and
>> > can't get it back?
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (l= ine 123)
>> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (l= ine 123)
>> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (l= ine 123)
>> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (l= ine 123)
>> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>> >
>> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <springrider@gmail.com>= ;
>> > wrote:
>> >>
>> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.jav= a (line 123)
>> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434<= br> >> >
>> >
>> > --
>> > =E9=97=AB=E6=98=A5=E8=B7=AF
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra supp= ort
>> http://www.d= atastax.com
>
>
>
> --
> =E9=97=AB=E6=98=A5=E8=B7=AF
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om



--
=E9=97=AB= =E6=98=A5=E8=B7=AF
--20cf307cfbf8dfaec004a88bbe9c--