Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C064CE62 for ; Fri, 8 Jun 2012 10:07:38 +0000 (UTC) Received: (qmail 83353 invoked by uid 500); 8 Jun 2012 10:07:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83080 invoked by uid 500); 8 Jun 2012 10:07:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83057 invoked by uid 99); 8 Jun 2012 10:07:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jun 2012 10:07:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of omidaladini@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jun 2012 10:07:27 +0000 Received: by pbcwy7 with SMTP id wy7so2353126pbc.31 for ; Fri, 08 Jun 2012 03:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=pE1+3qw+Fgq0TPhVH9KLDVUeIy6/Sh5Y0KUn5Tbxlbg=; b=jzvWKf43dNVXVbh73T0EGp6KgcZE+I8FzDyw+RpoYS8Tbtc5LQz0hPuGXuSGcN7o6v QVeo4fkrX9TrXeje7am1VEa1WYzvahN9lEnf1x9/lKnAZA5MiSoXCMKiyTfEg2CU1JBG CLymlI7ZRMM3RQ2Sap3P2R1S3i24gv1E3MVmz77CKGRNZBaAMwPtB16l27KiVj02l5DC bq4dKresK4Vg9jAoslDzFyAA/jrPRaX360BwqPHown8tumrEA8D0WZ2pUioRsvEiuWKg bsOujo073srfejqtm4gbm4uD84XOKN1Xq38a8xF0MCYorjkEOeVeJzwfpMPe32RDuY7H qlWA== Received: by 10.68.201.36 with SMTP id jx4mr17997977pbc.140.1339150026070; Fri, 08 Jun 2012 03:07:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.226.34 with HTTP; Fri, 8 Jun 2012 03:06:25 -0700 (PDT) In-Reply-To: References: From: Omid Aladini Date: Fri, 8 Jun 2012 12:06:25 +0200 Message-ID: Subject: Re: Cassandra 1.1.1 stack overflow on an infinite loop building IntervalTree To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b15af1d9ad95904c1f32963 --047d7b15af1d9ad95904c1f32963 Content-Type: text/plain; charset=UTF-8 Also looks similar to this ticket: https://issues.apache.org/jira/browse/CASSANDRA-4078 On Thu, Jun 7, 2012 at 6:48 PM, Omid Aladini wrote: > Hi, > > One of my 1.1.1 nodes doesn't restart due to stack overflow on building > the interval tree. Bumping the stack size doesn't help. Here's the stack > trace: > > https://gist.github.com/2889611 > > It looks more like an infinite loop on IntervalNode constructor's logic > than a deep tree since DEBUG log shows looping over the same intervals: > > https://gist.github.com/2889862 > > Running it with assertions enabled shows a number of sstables which the > first key > last key, for example: > > 2012-06-07_16:12:18.18781 java.lang.AssertionError: SSTable first key > DecoratedKey(22540092521493542684444486114339861094, > 3730343137317c3438333632333932) > last key > DecoratedKey(22166106697727078019854024428005234814, > 313138323637397c3432373931353435) > > and let's the node come up without hitting IntervalNode constructor. I > wonder how invalid sstables get create in the first place? Is there a way > to verify if other nodes in the cluster are affected as well? > > Speaking of a solution to get the node back up without wiping the data off > and let it bootstrap again, I was wondering if I remove affected sstables > and restart the node followed by a repair, will the node end up in a > consistent state? > > SStables contain counter columns and leveled compaction is used. > > Thanks, > Omid > --047d7b15af1d9ad95904c1f32963 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Also looks similar to this ticket:

https://issues.apache= .org/jira/browse/CASSANDRA-4078


On Thu, Jun 7, 2012 at 6:48 PM, Omid Ala= dini <omidaladini@gmail.com> wrote:
Hi,

One of my 1.1.1 nodes doesn't r= estart due to stack overflow on building the interval tree. Bumping the sta= ck size doesn't help. Here's the stack trace:


It looks more like an = infinite loop on IntervalNode constructor's logic than a deep tree sinc= e DEBUG log shows looping over the same intervals:


Running= it with assertions enabled shows a number of sstables which the first key = > last key, for example:

2012-06-07_16:12:18.18781 java.lang.AssertionError= : SSTable first key DecoratedKey(22540092521493542684444486114339861094, 37= 30343137317c3438333632333932) > last key DecoratedKey(221661066977270780= 19854024428005234814, 313138323637397c3432373931353435)

and let's the node come up without hitting=C2= =A0IntervalNode constructor. I wonder how invalid sstables get create in th= e first place? Is there a way to verify if other nodes in the cluster are a= ffected as well?

Speaking of a solution to get the node back up without = wiping the data off and let it bootstrap again, I was wondering if I remove= affected sstables and restart the node followed by a repair, will the node= end up in a consistent state?

SStables contain counter columns and leveled compaction= is used.

Thanks,
Omid

--047d7b15af1d9ad95904c1f32963--