Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DDE34066 for ; Sat, 2 Jul 2011 00:46:33 +0000 (UTC) Received: (qmail 36333 invoked by uid 500); 2 Jul 2011 00:46:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 36300 invoked by uid 500); 2 Jul 2011 00:46:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36292 invoked by uid 99); 2 Jul 2011 00:46:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jul 2011 00:46:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of strib@nicira.com) Received: from [209.85.210.44] (HELO mail-pz0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jul 2011 00:46:22 +0000 Received: by pzk5 with SMTP id 5so4458190pzk.31 for ; Fri, 01 Jul 2011 17:46:02 -0700 (PDT) Received: by 10.143.63.3 with SMTP id q3mr1700194wfk.25.1309567561904; Fri, 01 Jul 2011 17:46:01 -0700 (PDT) Received: from [172.16.0.50] (173-167-111-49-sfba.hfc.comcastbusiness.net [173.167.111.49]) by mx.google.com with ESMTPS id e6sm2273720pbm.55.2011.07.01.17.46.00 (version=SSLv3 cipher=OTHER); Fri, 01 Jul 2011 17:46:01 -0700 (PDT) Message-ID: <4E0E6A48.3090501@nicira.com> Date: Fri, 01 Jul 2011 17:46:00 -0700 From: Jeremy Stribling User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.10) Gecko/20100619 Icedove/3.0.5 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: incomplete schema sync for new node Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, I'm running into a problem with Cassandra, where a new node coming up seems to only get an incomplete set of schema mutations when bootstrapping, and as a result hits an "IllegalStateException: replication factor (3) exceeds number of endpoints (2)" error. I will describe the sequence of events below as I see them, but first I need to warn you that I run Cassandra in a very non-standard way. I embed it in a JVM, along with Zookeeper, and other classes for a product we are working on. We need to bring nodes up and down dynamically in our product, including going from one node to three nodes, and back down to one, at any time. If we ever drop below three nodes, we have code that sets the replication factor of our keyspaces to 1; similarly, whenever we have three or more nodes, we change the replication factor to 3. I know this is frowned upon by the community, but we're stuck with doing it this way for now. Ok, here's the scenario: 1) Node 50.0.0.4 bootstraps into a cluster consisting of nodes 50.0.0.2 and 50.0.0.3. 2) Once 50.0.0.4 is fully bootstrapped, we change the replication factor for our two keyspaces to 3. 3) Then node 50.0.0.2 is taken down permanently, and we change the replication factor back down to 1. 4) We then remove node 50.0.0.2's tokens using the removeToken call on node 50.0.0.3. 5) Then we start node 50.0.0.5, and have it join the cluster using 50.0.0.3 and 50.0.0.4 as seeds. 6) 50.0.0.5 starts receiving schema mutations to get it up to speed; the last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has the replication factor at 3. However, there should be more schema updates after this that never arrive (you can see them arrive at 50.0.0.4 while it is bootstrapping). 7) Minutes after receiving this last mutation, node 50.0.0.5 hits the IllegalStateException I've listed above, and I think for that reason never successfully joins the cluster. My question is why doesn't node 50.0.0.5 receive the schema updates that follow 7d51e757-a40b-11e0-a98d-65ed1eced995? (For example, 8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in 50.0.0.4's log and sets the replication factor back down to 1.) I've put logs for nodes 50.0.0.3/4/5 at http://pdos.csail.mit.edu/~strib/cassandra_logs.tgz . The logs are pretty messy because they includes log messages from both Zookeeper and our product code -- sorry about that. Also, I think the clock on node 50.0.0.4 is a few minutes ahead of the other nodes' clocks. I also noticed in 50.0.0.4's log the following exceptions: 2011-07-01 18:00:49,832 76315 [HintedHandoff:1] ERROR org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor - Error in ThreadPoolExecutor java.lang.RuntimeException: java.lang.RuntimeException: Could not reach schema agreement with /50.0.0.3 in 60000ms at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I don't know if that's related or not. Thanks in advance, Jeremy