Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of omidaladini@gmail.com
 designates 209.85.160.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAH-t0tYLZ7+=2Ba8_--Y_zPw6QPG8suipvzL4TXfNneYMkA+sg@mail.gmail.com>
References: 
 <CAH-t0tYLZ7+=2Ba8_--Y_zPw6QPG8suipvzL4TXfNneYMkA+sg@mail.gmail.com>
From: Omid Aladini <omidaladini@gmail.com>
Date: Fri, 8 Jun 2012 12:06:25 +0200
Message-ID: 
 <CAH-t0tYR91gPP_FACubcbc8ber4x4n3MyqWxCgme+eGASPygYw@mail.gmail.com>
Subject: Re: Cassandra 1.1.1 stack overflow on an infinite loop building
 IntervalTree
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b15af1d9ad95904c1f32963

--047d7b15af1d9ad95904c1f32963
Content-Type: text/plain; charset=UTF-8

Also looks similar to this ticket:

https://issues.apache.org/jira/browse/CASSANDRA-4078<https://issues.apache.org/jira/browse/CASSANDRA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>


On Thu, Jun 7, 2012 at 6:48 PM, Omid Aladini <omidaladini@gmail.com> wrote:

> Hi,
>
> One of my 1.1.1 nodes doesn't restart due to stack overflow on building
> the interval tree. Bumping the stack size doesn't help. Here's the stack
> trace:
>
> https://gist.github.com/2889611
>
> It looks more like an infinite loop on IntervalNode constructor's logic
> than a deep tree since DEBUG log shows looping over the same intervals:
>
> https://gist.github.com/2889862
>
> Running it with assertions enabled shows a number of sstables which the
> first key > last key, for example:
>
> 2012-06-07_16:12:18.18781 java.lang.AssertionError: SSTable first key
> DecoratedKey(22540092521493542684444486114339861094,
> 3730343137317c3438333632333932) > last key
> DecoratedKey(22166106697727078019854024428005234814,
> 313138323637397c3432373931353435)
>
> and let's the node come up without hitting IntervalNode constructor. I
> wonder how invalid sstables get create in the first place? Is there a way
> to verify if other nodes in the cluster are affected as well?
>
> Speaking of a solution to get the node back up without wiping the data off
> and let it bootstrap again, I was wondering if I remove affected sstables
> and restart the node followed by a repair, will the node end up in a
> consistent state?
>
> SStables contain counter columns and leveled compaction is used.
>
> Thanks,
> Omid
>

--047d7b15af1d9ad95904c1f32963
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Also looks similar to this ticket:<div><br></div><div><a h=
ref=3D"https://issues.apache.org/jira/browse/CASSANDRA-4078?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:all-tabpanel">https://issues.apache=
.org/jira/browse/CASSANDRA-4078</a><br clear=3D"all">

<br><br><div class=3D"gmail_quote">On Thu, Jun 7, 2012 at 6:48 PM, Omid Ala=
dini <span dir=3D"ltr">&lt;<a href=3D"mailto:omidaladini@gmail.com" target=
=3D"_blank">omidaladini@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">

<div dir=3D"ltr">Hi,<div><br></div><div>One of my 1.1.1 nodes doesn&#39;t r=
estart due to stack overflow on building the interval tree. Bumping the sta=
ck size doesn&#39;t help. Here&#39;s the stack trace:</div><div><br></div>


<div><a href=3D"https://gist.github.com/2889611" target=3D"_blank">https://=
gist.github.com/2889611</a></div><div><br></div><div>It looks more like an =
infinite loop on IntervalNode constructor&#39;s logic than a deep tree sinc=
e DEBUG log shows looping over the same intervals:</div>


<div><br></div><div><a href=3D"https://gist.github.com/2889862" target=3D"_=
blank">https://gist.github.com/2889862</a></div><div><br></div><div>Running=
 it with assertions enabled shows a number of sstables which the first key =
&gt; last key, for example:</div>


<div><br></div><div><div>2012-06-07_16:12:18.18781 java.lang.AssertionError=
: SSTable first key DecoratedKey(22540092521493542684444486114339861094, 37=
30343137317c3438333632333932) &gt; last key DecoratedKey(221661066977270780=
19854024428005234814, 313138323637397c3432373931353435)</div>


</div><div><br></div><div>and let&#39;s the node come up without hitting=C2=
=A0IntervalNode constructor. I wonder how invalid sstables get create in th=
e first place? Is there a way to verify if other nodes in the cluster are a=
ffected as well?</div>


<div><br></div><div>Speaking of a solution to get the node back up without =
wiping the data off and let it bootstrap again, I was wondering if I remove=
 affected sstables and restart the node followed by a repair, will the node=
 end up in a consistent state?</div>


<div><br></div><div>SStables contain counter columns and leveled compaction=
 is used.</div><div><br></div><div>Thanks,</div><div>Omid</div></div>
</blockquote></div><br></div></div>

--047d7b15af1d9ad95904c1f32963--