Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 056B51041F for ; Fri, 30 Aug 2013 16:01:09 +0000 (UTC) Received: (qmail 34197 invoked by uid 500); 30 Aug 2013 16:01:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34175 invoked by uid 500); 30 Aug 2013 16:01:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34154 invoked by uid 99); 30 Aug 2013 16:00:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 16:00:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jonathan.haddad@gmail.com designates 209.85.160.47 as permitted sender) Received: from [209.85.160.47] (HELO mail-pb0-f47.google.com) (209.85.160.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 16:00:52 +0000 Received: by mail-pb0-f47.google.com with SMTP id rr4so2026426pbb.20 for ; Fri, 30 Aug 2013 09:00:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=NJ9wYCOvvPRL9//z/z0DII0zx/PMpTOQNkyXFK4J6qw=; b=bYli1w7YsQnsWI6NmBHdWo4eG57xP3D7+ulF2O6ABmvQWFCHlDv+lvy5NkLLx487Io zkpja1+K1RY9tBQ6CYVCouvOyItfGmLHNa+FizromtH5cTtRFrOafbSJ06ShxbHp7WIp yDeX6mLwCDMwzvyVgUwidt3ZH/D/K39dYCJyrSgLZqPpUS8lvW96SwFU4Qz1binK+G2o 3+KbF8PlJB1OTbVWNa8LP0Xn0Jdz946QCYFlH9drA1pqjssyQ3ANHF+GSnV7xxSKJtev nBZ5ktTnqAcuPqSHUdeyNEUhYxAOSXoZWyFh3StcebiquQApCDqr0oG9iUW1WUbO3Dfm wzNg== X-Received: by 10.66.221.8 with SMTP id qa8mr4166611pac.188.1377878431216; Fri, 30 Aug 2013 09:00:31 -0700 (PDT) Received: from baconomatic.home (pool-71-118-174-148.lsanca.fios.verizon.net. [71.118.174.148]) by mx.google.com with ESMTPSA id wr9sm45285989pbc.7.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 30 Aug 2013 09:00:30 -0700 (PDT) Sender: Jon Haddad Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Upgrade from 1.0.9 to 1.2.8 From: Jon Haddad In-Reply-To: <5220C0D5.3030302@liquidweb.com> Date: Fri, 30 Aug 2013 09:00:31 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <17715C2E-7368-4785-AF6D-B8857CF2FEAD@jonhaddad.com> References: <5220C0D5.3030302@liquidweb.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org Does your previous snapshot include the system keyspace? I haven't = tried upgrading from 1.0.x then rolling back, but it's possible there's = some backwards incompatible changes. Other than that, make sure you = also rolled back your config files?=20 On Aug 30, 2013, at 8:57 AM, Mike Neir wrote: > Greetings folks, >=20 > I'm faced with the need to update a 36 node cluster with roughly 25T = of data on disk to a version of cassandra in the 1.2.x series. While it = seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do = a rolling upgrade, I'd still like to have a roll-back plan in case the = rolling upgrade goes sideways. >=20 > I've tried to upgrade a single node in my dev cluster, then roll back = using a snapshot taken previously, but things don't appear to be going = smoothly. The node will rejoin the ring eventually, but not after = spending some time in the "Joining" state as shown by "nodetool ring", = and spewing a ton of error messages similar to the following: >=20 > ERROR [MutationStage:31] 2013-08-29 14:07:20,530 = RowMutationVerbHandler.java (line 61) Error in row mutation > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find = cfId=3D1178 >=20 > My test procedure is as follows: > 1) nodetool -h localhost snapshot > 2) nodetool -h localhost drain > 3) service cassandra stop > 4) back up cassandra configs > 5) remove cassandra 1.0.9 > 6) install cassandra 1.2.8 > 7) restore cassandra configs, alter them to remove configuration = entries no longer used > 8) start cassandra 1.2.8, let it run for a bit, then drain/stop it > 9) remove cassandra 1.2.8 > 10) reinstall cassandra 1.0.9 > 11) restore original cassandra configs > 12) remove any commit logs present > 13) remove folders for system_auth and system_traces Keyspaces (since = they don't seem to be present in 1.0.9) > 14) Move snapshots back to where they should be for 1.0.9 and remove = cass 1.2.8 data > # cd /var/lib/cassandra/data/$KEYSPACE/ > # mv */snapshots/$TIMESTAMP/* . > # find . -mindepth 1 -type d -exec rm -rf {} \; > # cd /var/lib/cassandra/data/system > # mv */snapshots/$TIMESTAMP/* . > # find . -mindepth 1 -type d -exec rm -rf {} \; > 15) start cassandra 1.0.9 > 16) observe cassandra system.log >=20 > Does anyone have any insight on things I may be doing wrong, or = whether this is just an unavoidable pain point caused by rolling back? = It seems that since there are no schema changes going on, the node = should be able to just hop back into the cluster without error and = without transitioning through the "Joining" state. >=20 > --=20 >=20 >=20 >=20 > Mike Neir > Liquid Web, Inc. > Infrastructure Administrator >=20