Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 85412 invoked from network); 29 Oct 2009 19:19:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Oct 2009 19:19:16 -0000 Received: (qmail 51759 invoked by uid 500); 29 Oct 2009 19:19:15 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 51720 invoked by uid 500); 29 Oct 2009 19:19:15 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 51711 invoked by uid 99); 29 Oct 2009 19:19:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2009 19:19:15 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edmond@ooyala.com designates 209.85.160.41 as permitted sender) Received: from [209.85.160.41] (HELO mail-pw0-f41.google.com) (209.85.160.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2009 19:19:07 +0000 Received: by pwj11 with SMTP id 11so251825pwj.20 for ; Thu, 29 Oct 2009 12:18:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.195.6 with SMTP id s6mr43310wff.258.1256843926089; Thu, 29 Oct 2009 12:18:46 -0700 (PDT) From: Edmond Lau Date: Thu, 29 Oct 2009 12:18:26 -0700 Message-ID: Subject: can't write with consistency level of one after some nodes fail To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org I have a freshly started 3-node cluster with a replication factor of 2. If I take down two nodes, I can no longer do any writes, even with a consistency level of one. I tried on a variety of keys to ensure that I'd get at least one where the live node was responsible for one of the replicas. I have not yet tried on trunk. On cassandra 0.4.1, I get an UnavailableException: DEBUG [pool-1-thread-1] 2009-10-29 18:53:24,371 CassandraServer.java (line 408) insert WARN [pool-1-thread-1] 2009-10-29 18:53:24,388 AbstractReplicationStrategy.java (line 135) Unable to find a live Endpoint we might be out of live nodes , This is dangerous !!!! ERROR [pool-1-thread-1] 2009-10-29 18:53:24,390 StorageProxy.java (line 179) error writing key 1 UnavailableException() at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:156) at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468) at org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421) at org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Moreover, if I bring another of the nodes back up, do some writes, take it back down again, and try to do writes with a consistency level of one, I get an ApplicationException with the error text "unknown result". There's nothing in the debug logs about this new error: DEBUG [pool-1-thread-7] 2009-10-29 19:08:26,411 CassandraServer.java (line 258) get DEBUG [pool-1-thread-7] 2009-10-29 19:08:26,411 CassandraServer.java (line 307) multiget DEBUG [pool-1-thread-7] 2009-10-29 19:08:26,413 StorageProxy.java (line 239) weakreadlocal reading SliceByNamesReadCommand(table='Keyspace1', key='3', columnParent='QueryPath(columnFamilyName='Standard1', superColumnName='null', columnName='null')', columns=[[118, 97, 108, 117, 101],]) I would've instead expected the node to accept the write and then have the key repaired on subsequent reads when the other nodes get back up. Along the same lines, how does Cassandra handle network partitioning where 2 writes for the same keys hit 2 different partitions, neither of which are able to form a quorum? Dynamo maintained version vectors and put the burden on the client to resolve conflicts, but there's no similar interface in the thrift api. Edmond