Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 9487 invoked from network); 21 May 2010 01:51:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 May 2010 01:51:20 -0000 Received: (qmail 34617 invoked by uid 500); 21 May 2010 01:51:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34576 invoked by uid 500); 21 May 2010 01:51:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34568 invoked by uid 99); 21 May 2010 01:51:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 May 2010 01:51:19 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shiyingjie1983@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 May 2010 01:51:13 +0000 Received: by gwj23 with SMTP id 23so256994gwj.31 for ; Thu, 20 May 2010 18:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=Nffy5g4wFxVEsujnibi4w9WQbEUWQRsutm1hCfkcYmQ=; b=oXscLA4V52+V13j3bHb/k/6zktj6pM8jiZDEvpjLVlBqWDYQR5zFxGE8zEjs5W3APD Sso7RR2168VABzFYqP+A7579RQzIVJab/tqyMMi/s0P85JKOWMsrISx8CjxgrZf1bXkc HTBBAE8kpEUMXyoTLVRb5kBX0YEY0zcgXzRHk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=BqEwQkyC/OtisRQ7098h5FsyqTHpXz04EECXubtBw+2uCeZHpgvEUjlquw4GvDgPe1 854x2vuLv/+zLy+roMjU1aDIvCtrtHQEYS/jxp7IzEsWkI3SfbKR5JYDxfYStAj3vAPL Z7BuB8Zogu2bvZIEiwUgA0JBlBY9FWte3Y6pM= MIME-Version: 1.0 Received: by 10.101.149.15 with SMTP id b15mr1023465ano.219.1274406651122; Thu, 20 May 2010 18:50:51 -0700 (PDT) Received: by 10.100.5.15 with HTTP; Thu, 20 May 2010 18:50:51 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 May 2010 09:50:51 +0800 Message-ID: Subject: Re: What happened if one server involved in the process of data reading fail? From: =?GB2312?B?yrfTor3c?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e68ee1e8bd5693048710eb30 X-Virus-Checked: Checked by ClamAV on apache.org --0016e68ee1e8bd5693048710eb30 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable What inner mechanism does Cassandra adopt to get this kind of fault tolerance? 2010/5/20 Simon Smith > On Thu, May 20, 2010 at 8:08 AM, =CA=B7=D3=A2=BD=DC wrote: > > Hi, All, > > I am now learning the mechanism Cassandra adopts to get high > > availability and fault tolerance. As I know, we should connect to one > > server of Cassandra first, then we can read or write data through it, = so > if > > the server which we connect to get down, what will happen? Should we ha= ve > to > > reconnect another server or will Cassandra control this situation? > > > The approach we're taking is to put the software load-balancer haproxy > in front of our cassandra cluster. Use "mode tcp" within haproxy's > config. I notice that Tragedy (http://github.com/enki/tragedy/) also > lets you put a list of servers into the connection call (we're going > to put the list of haproxy load balancers here). > > > > > Another sutiation, if the server which is involved in the process of da= ta > reading > > fail, what will Cassandra do? > > > If you're using Thrift to connect, catch the exceptions that library > throws if unable to connect and then try to connect again. This is > going to happen - if/when a node goes down it causes the entire > cluster to hiccup a little, so if it is critical that any particular > read transaction succeeds, you may need to sleep as much as 5 seconds > (this is just my experience). > > > > Thanks a lot! > > > > Yingjie > --0016e68ee1e8bd5693048710eb30 Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable What inner mechanism does Cassandra adopt to get this kind of fault toleran= ce?

2010/5/20 Simon Smith <simongsmith@gmail.com>
On Thu, May 20, 2010 at 8:08 AM, =CA=B7=D3=A2=BD=DC <<= a href=3D"mailto:shiyingjie1983@gmail.com">shiyingjie1983@gmail.com>= wrote:
> Hi, All,
>     I am now learning the m= echanism Cassandra adopts to get high
> availability and fault tolerance.  As I know, we should connect t= o one
> server of Cassandra first, then we can read or write data&nbs= p; through it, so if
> the server which we connect to get down, what = will happen? Should we have to
> reconnect another server or will Cassandra control this situation?
=

The approach we're taking is to put the software load-bal= ancer haproxy
in front of our cassandra cluster.  Use "mode tc= p" within haproxy's
config.  I notice that Tragedy (http://github.com/enki/tragedy/) also
lets y= ou put a list of servers into the connection call (we're going
to pu= t the list of haproxy load balancers here).



> Another sutiation, if the server which i= s involved in the process of data reading
> fail, what will Cass= andra do?


If you're using Thrift to connect, catch the= exceptions that library
throws if unable to connect and then try to connect again.   This isgoing to happen - if/when a node goes down it causes the entire
cluste= r to hiccup a little, so if it is critical that any particular
read tran= saction succeeds, you may need to sleep as much as 5 seconds
(this is just my experience).


>     Thank= s a lot!
>
> Yingjie

--0016e68ee1e8bd5693048710eb30--