Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1597DE45 for ; Fri, 27 Jul 2012 18:43:37 +0000 (UTC) Received: (qmail 71531 invoked by uid 500); 27 Jul 2012 18:43:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 71502 invoked by uid 500); 27 Jul 2012 18:43:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 71491 invoked by uid 99); 27 Jul 2012 18:43:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2012 18:43:35 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [72.35.23.30] (HELO smtp-out2.electric.net) (72.35.23.30) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2012 18:43:30 +0000 Received: from [10.86.10.82] (helo=fuseout2b.electric.net) by bean.electric.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.77) (envelope-from ) id 1SupVE-0007G8-UQ for user@cassandra.apache.org; Fri, 27 Jul 2012 11:43:08 -0700 Received: from mailanyone.net by fuseout2b.electric.net with esmtpa (MailAnyone extSMTP dbrosius@baybroadband.net) id 1SupVD-00085Q-Oz for user@cassandra.apache.org; Fri, 27 Jul 2012 11:43:08 -0700 Received: from 127.0.0.1 (MailAnyone web AccountID 596227) by webmail.mailanyone.net with HTTP; Fri, 27 Jul 2012 13:43:07 -0500 (CDT) Message-ID: <1343414587.v2.mailanyonewebmail-596227@fuseweb2e> Date: Fri, 27 Jul 2012 13:43:07 -0500 (CDT) Subject: Re: increased RF and repair, not working? From: "Dave Brosius" To: user@cassandra.apache.org Reply-To: dbrosius@mebigfatguy.com User-Agent: MailAnyone Web MIME-Version: 1.0 Content-Type: multipart/alternative;charset=utf-8; boundary="----=_20120727134307_56495" X-Priority: 3 (Normal) Importance: Normal X-Virus-Checked: Checked by ClamAV on apache.org ------=_20120727134307_56495 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit You have RF=2, CL= Quorum but 3 nodes. So each row is represented on 2 of the 3 nodes.If you take a node down, one of two things can happen when you attempt to read a row.The row lives on the two nodes that are still up. In this case you will successfully read the data.The row lives on one node that is up, and one node that is down. In this case the read will fail because you haven't fulfilled the quorum (2 nodes in agreement) requirement. ----- Original Message -----From: "Riyad Kalla" >;rkalla@gmail.com ------=_20120727134307_56495 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit You have RF=2, CL= Quorum but 3 nodes.

So each row is represented on 2 of the 3 nodes.

If you take a node down, one of two things can happen when you attempt to read a row.

The row lives on the two nodes that are still up. In this case you will successfully read the data.

The row lives on one node that is up, and one node that is down. In this case the read will fail because you haven't fulfilled the quorum (2 nodes in agreement) requirement.

----- Original Message -----
From: "Riyad Kalla" <rkalla@gmail.com>
Sent: Fri, July 27, 2012 8:08
Subject: Re: increased RF and repair, not working?

Dave, per my understanding of Yan's description he has 3 nodes and took one down manually to test; that should have worked, no?

On Thu, Jul 26, 2012 at 11:00 PM, Dave Brosius <dbrosius@mebigfatguy.com> wrote:

Quorum is defined as
(replication_factor / 2) + 1

therefore quorum when rf = 2 is 2! so in your case, both nodes must be up.  Really, using Quorum only starts making sense as a 'quorum' when RF=3 
On 07/26/2012 10:38 PM, Yan Chunlu wrote:
I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency level of read & write are both QUORUM.

At first the RF=1, and I figured that one node down will cause the cluster unusable. so I changed RF to 2, and run nodetool repair on every node(actually I did it twice).

After the operation I think my data should be in at least two nodes, and it would be okay if one of them is down.

But when I tried to simulate the failure, by disablegossip of one node, and the cluster knows this node is dow n. then access data from the cluster, it returned "MaximumRetryException"(pycassa). as my experiences this is caused by "UnavailableException", which is means the data it is requesting is on a node which is down.

so I wonder my data might not be replicated right, what should I do? thanks for the help!

here is the keyspace info:

Keyspace: comments:

Replication Strategy: org.apache.cassandra.locator.SimpleStrategy

Durable Writes: true

Options: [replication_factor:2]

the scheme version is okay:

[default@unknown] describe cluster;

Cluster Information:

Snitch: org.apache.cassandra.locator.SimpleSnitch

Partitioner: org.apache.cassandra.dht.RandomPartitioner

Schema versions:

f67d0d50-b923-11e1-0000-4f7cf9240aef: [192.168.1.129, 192.168.1.40, 192.168.1.50]

the loads are as below:

nodetool -h localhost ring

Address DC Rack Status State Load ;Owns Token

113427455640312821154458202477256070484

192.168.1.50 datacenter1 rack1 Up Normal 28.77 GB 33.33% 0

192.168.1.40 datacenter1 rac k1 Up Normal 26.67 GB 33.33% 56713727820156410577229101238628035242

192.168.1.129 datacenter1 rack1 Up Normal 33.25 GB 33.33% 113427455640312821154458202477256070484

------=_20120727134307_56495--