incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: with proof Re: cassandra goes infinite loop and data lost.....
Date Thu, 21 Jul 2011 14:41:09 GMT
You should be able to tell from earlier in the log if this is from a
request, from hinted handoff replay, or something else

On Wed, Jul 20, 2011 at 10:42 PM, Yan Chunlu <springrider@gmail.com> wrote:
> thans for the reply.
> now the problem is how can I get rid of the ""N of 2147483647 ", it seems
> never ends, and the node never goes UP....
> last time it happens I run "node cleanup", turns out some data loss(not sure
> if caused by cleanup).
>
> On Thu, Jul 21, 2011 at 11:37 AM, aaron morton <aaron@thelastpickle.com>
> wrote:
>>
>> Personally I would do a repair first if you need to do one, just so you
>> are confident everything is where is should be.
>> Then do the move as described in the wiki.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>>
>> sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
>> thought it was not doing anything.
>> my node was very unbalanced and I was intend to rebalance it by "nodetool
>> move" after a "node repair", does that cause the slices much large?
>> Address         Status State   Load            Owns    Token
>>
>>
>>  84944475733633104818662955375549269696
>> 10.28.53.2      Down   Normal  71.41 GB        81.09%
>>  52773518586096316348543097376923124102
>> 10.28.53.3     Up     Normal  14.72 GB        10.48%
>>  70597222385644499881390884416714081360
>> 10.28.53.4      Up     Normal  13.5 GB         8.43%
>> 84944475733633104818662955375549269696
>>
>> should I do "nodetool move" according to
>> http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
>> repair?
>> thank you for your help!
>>
>>
>> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>>>
>>> This is not an infinite loop, you can see the column objects being
>>> iterated over are different.
>>>
>>> Like I said last time, "I do see that it's saying "N of 2147483647"
>>> which looks like you're
>>> doing slices with a much larger limit than is advisable."
>>>
>>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <springrider@gmail.com>
>>> wrote:
>>> > this time it is another node, the node goes down during repair, and
>>> > come
>>> > back but never up, I change log level to "DEBUG" and found out it print
>>> > out
>>> > the following message infinitely
>>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>>> >
>>> >
>>> >
>>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jbellis@gmail.com>
>>> > wrote:
>>> >>
>>> >> That says "I'm collecting data to answer requests."
>>> >>
>>> >> I don't see anything here that indicates an infinite loop.
>>> >>
>>> >> I do see that it's saying "N of 2147483647" which looks like you're
>>> >> doing slices with a much larger limit than is advisable (good way to
>>> >> OOM the way you already did).
>>> >>
>>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <springrider@gmail.com>
>>> >> wrote:
>>> >> > I gave cassandra 8GB heap size and somehow it run out of memory
and
>>> >> > crashed.
>>> >> > after I start it, it just runs in to the following infinite loop,
>>> >> > the
>>> >> > last
>>> >> > line:
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > goes for ever
>>> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
>>> >> > screwed
>>> >> > and
>>> >> > can't get it back?
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>>> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>>> >> >
>>> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <springrider@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java
(line
>>> >> >> 123)
>>> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> >
>>> >> >
>>> >> > --
>>> >> > 闫春路
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>> >
>>> > --
>>> > 闫春路
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> 闫春路
>>
>
>
>
> --
> 闫春路
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message