Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 73922 invoked from network); 10 Apr 2011 04:57:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Apr 2011 04:57:54 -0000 Received: (qmail 12198 invoked by uid 500); 10 Apr 2011 04:57:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12167 invoked by uid 500); 10 Apr 2011 04:57:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12159 invoked by uid 99); 10 Apr 2011 04:57:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Apr 2011 04:57:50 +0000 X-ASF-Spam-Status: No, hits=2.6 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,TRACKER_ID,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of patricioe@gmail.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-px0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Apr 2011 04:57:44 +0000 Received: by pxi7 with SMTP id 7so2545931pxi.30 for ; Sat, 09 Apr 2011 21:57:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=Iuc+58iVyGJrgBV12iuFeQ+WsL1mh9LH5tRkwium8q4=; b=I2H4FzhEeFRudipxLP4GQnc+o5mXTa51f7d3RU6PiYZ65VDePbDoJmUVKuB2UEsrUi YhLd7oPQykoXIiNMpeNLT1z7NFfmufEshNxNvmPPmvVwYE64jrZ21m5ooi8rVgT0WZVt zoWfNsrBmAgnYsvYLqqkDI9oUtdlItNNElU3M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=uciBjLmN7Dzahhm/7qkhUfEKTM07xR0PWQypmA7HvJ5FuGsxJKcxQCBaeXfTYB9+Vo Qdzo9fDOEFpYGmJJ9FjQOgTeFw1+mAHbERJTZUwBiXYEYDSFscvTwTqqG8hyoNGxUHfb GlKltUKHOFrx//sCsubzpnG0LQa9UwDzHHXeg= Received: by 10.142.210.17 with SMTP id i17mr3572400wfg.378.1302411443040; Sat, 09 Apr 2011 21:57:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.66.163 with HTTP; Sat, 9 Apr 2011 21:57:03 -0700 (PDT) In-Reply-To: References: <301C715B-CD4A-4C09-9E4A-A2F04635D937@joestump.net> From: =?ISO-8859-1?Q?Patricio_Echag=FCe?= Date: Sat, 9 Apr 2011 21:57:03 -0700 Message-ID: Subject: Re: Site Not Surviving a Single Cassandra Node Crash To: user@cassandra.apache.org Cc: aaron morton Content-Type: multipart/alternative; boundary=000e0cd32ed669f10604a0894b8c X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd32ed669f10604a0894b8c Content-Type: text/plain; charset=ISO-8859-1 What is the consistency level you are using ? And as Ed said, if you can provide the stacktrace that would help too. On Sat, Apr 9, 2011 at 7:02 PM, aaron morton wrote: > btw, the nodes are a tad out of balance was that deliberate ? > > http://wiki.apache.org/cassandra/Operations#Token_selection > http://wiki.apache.org/cassandra/Operations#Load_balancing > > > Aaron > > On 10 Apr 2011, at 08:44, Ed Anuff wrote: > > Sounds like the problem might be on the hector side. Lots of hector > users on this list, but usually not a bad idea to ask on > hector-users@googlegroups.com (cc'd). > > The jetty servers stopping responding is a bit vague, somewhere in > your logs is an error message that should shed some light on where > things are going awry. If you can find the exception that's being > thrown in hector and post that, it'd make it much easier to help you > out. > > Ed > > On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian > wrote: > > The hector clients are used as part of our jetty servers. And, the > > jetty servers stop responding when one of the Cassandra nodes go down. > > > Vram > > > On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump wrote: > > Did the Cassandra cluster go down or did you start getting failures from > the client when it routed queries to the downed node? The key in the client > is to keep working around the ring if the initial node is down. > > > --Joe > > > On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote: > > > We have a 5 Cassandra nodes with the following configuration: > > > Casandra Version: 0.6.11 > > Number of Nodes: 5 > > Replication Factor: 3 > > Client: Hector 0.6.0-14 > > Write Consistency Level: Quorum > > Read Consistency Level: Quorum > > Ring Topology: > > Owns Range Ring > > > 132756707369141912386052673276321963528 > > 192.168.89.153Up 4.15 GB 33.87% > > 20237398133070283622632741498697119875 |<--| > > 192.168.89.155Up 5.17 GB 18.29% > > 51358066040236348437506517944084891398 | ^ > > 192.168.89.154Up 7.41 GB 33.97% > > 109158969152851862753910401160326064203 v | > > 192.168.89.152Up 5.07 GB 6.34% > > 119944993359936402983569623214763193674 | ^ > > 192.168.89.151Up 4.22 GB 7.53% > > 132756707369141912386052673276321963528 |-->| > > > We believe that our setup should survive the crash of one of the > > Cassandra nodes. But, we had few crashes and the system stopped > > functioning until we brought back the Cassandra nodes. > > > Any clues? > > > Vram > > > > > > --000e0cd32ed669f10604a0894b8c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What is the consistency level you are using ?

And as Ed = said, if you can provide the stacktrace that would help too.

On Sat, Apr 9, 2011 at 7:02 PM, aaron morton <aaron@thelastpickle= .com> wrote:
btw, th= e nodes are a tad out of balance was that deliberate ?=A0

http://wiki.apache.org/cassandra/Operations#Token_selectio= n


Aaro= n

On 10 Apr 2011,= at 08:44, Ed Anuff wrote:

Sounds l= ike the problem might be on the hector side. =A0Lots of hector
users on = this list, but usually not a bad idea to ask on
hector-u= sers@googlegroups.com (cc'd).

The jetty servers stopping res= ponding is a bit vague, somewhere in
your logs is an error message that = should shed some light on where
things are going awry. =A0If you can find the exception that's beingthrown in hector and post that, it'd make it much easier to help youout.

Ed

On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian <vram.ko= uramajian@gmail.com> wrote:
The hector = clients are used as part of our jetty servers. And, the
jetty servers stop responding when one of the Cassandra nodes go down.
<= /blockquote>

Vram

On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump <joe@joestump.net> wrote:
<= blockquote type=3D"cite">
Did the Cassandra cluste= r go down or did you start getting failures from the client when it routed = queries to the downed node? The key in the client is to keep working around= the ring if the initial node is down.

--Joe

On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:

We have a 5 Cassandra nodes with the following configuratio= n:

Cas= andra Version: 0.6.11
Number of Nodes: 5
<= /blockquote>
Replication Factor: 3
Client: Hector 0.6.0-14
Write Consistency Level: Quorum
<= blockquote type=3D"cite">
Read Consistency Level: Quorum
=
Ring Topology:
=A0 Owns =A0 =A0Range =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Ring

132756707369141912386052673276321963528
192.168.89.153Up =A0 =A0 =A0 =A0 4.15 GB =A0 =A0 =A0 33.87%
20237398133070283622632741498697119875 =A0 =A0 |<--|
=
192.168.89.155Up =A0 =A0 =A0 =A0 5.17 GB =A0 =A0 = =A0 18.29%
5135806604023634843= 7506517944084891398 =A0 =A0 | =A0 ^
192.168.89.154Up =A0 =A0 =A0 =A0 = 7.41 GB =A0 =A0 =A0 33.97%
10915896915285186275391= 0401160326064203 =A0 =A0v =A0 |
<= blockquote type=3D"cite">
192.168.89.152Up =A0 =A0 =A0 =A0 5.07 GB =A0 =A0 =A0 6.34%
119944993359936402983569623214763= 193674 =A0 =A0| =A0 ^
192.168.89.151Up =A0 = =A0 =A0 =A0 4.22 GB =A0 =A0 =A0 7.53%
132756707369141912386052673276321963528 =A0 =A0|-->|

We believe that our setup should survive the crash of one of the
Cassandra nodes. But, we had few crashe= s and the system stopped
functioning until we brought back= the Cassandra nodes.

Any clues?

Vram





--000e0cd32ed669f10604a0894b8c--