From user-return-34931-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Jun 28 04:49:14 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A388BC04F for ; Fri, 28 Jun 2013 04:49:14 +0000 (UTC) Received: (qmail 93288 invoked by uid 500); 28 Jun 2013 04:49:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92682 invoked by uid 500); 28 Jun 2013 04:49:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92666 invoked by uid 99); 28 Jun 2013 04:48:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 04:48:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of agundabattula@threatmetrix.com designates 98.129.35.12 as permitted sender) Received: from [98.129.35.12] (HELO server505.appriver.com) (98.129.35.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 04:48:53 +0000 X-Note-AR-ScanTimeLocal: 6/27/2013 11:48:31 PM X-Policy: GLOBAL - threatmetrix.com X-Primary: agundabattula@threatmetrix.com X-Note: This Email was scanned by AppRiver SecureTide X-ALLOW: @threatmetrix.com ALLOWED X-Virus-Scan: V- X-Note: Spam Tests Failed: X-Country-Path: UNKNOWN->UNITED STATES->UNITED STATES X-Note-Sending-IP: 98.129.35.1 X-Note-Reverse-DNS: smtp.exg5.exghost.com X-Note-Return-Path: agundabattula@threatmetrix.com X-Note: User Rule Hits: X-Note: Global Rule Hits: G319 G320 G321 G322 G326 G327 G338 G434 X-Note: Encrypt Rule Hits: X-Note: Mail Class: ALLOWEDSENDER X-Note: Headers Injected Received: from [98.129.35.1] (HELO smtp.exg5.exghost.com) by server505.appriver.com (CommuniGate Pro SMTP 6.0.2) with ESMTPS id 391805673 for user@cassandra.apache.org; Thu, 27 Jun 2013 23:48:31 -0500 Received: from MBX30.exg5.exghost.com ([169.254.1.69]) by HT06.exg5.exghost.com ([98.129.23.206]) with mapi; Thu, 27 Jun 2013 23:48:31 -0500 From: Ananth Gundabattula To: "user@cassandra.apache.org" Date: Thu, 27 Jun 2013 23:48:29 -0500 Subject: Errors while upgrading from 1.1.10 version to 1.2.4 version Thread-Topic: Errors while upgrading from 1.1.10 version to 1.2.4 version Thread-Index: Ac5zuryvsMlPqmc/TvuYzW8uHZOdAQ== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.5.130515 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CDF352BD12371agundabattulathreatmetrixcom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CDF352BD12371agundabattulathreatmetrixcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello Everybody, We were performing an upgrade of our cluster from 1.1.10 version to 1.2.4 .= We tested the upgrade process in a QA environment and found no issues. How= ever in the production node, we faced loads of errors and had to abort the = upgrade process. I was wondering how we ran into such a situation. The main difference betwe= en the QA environment and the production environments is the Replication Fa= ctor. In QA , RF=3D1 and in production RF=3D3. Example stack traces are as seen on the other nodes are : http://pastebin.= com/fSnMAd8q The other observation is that the node which was being upgraded is a seed n= ode in the 1.1.10. We aborted right after the first node gave the above iss= ues. Does this mean that there will be an application downtime required if = we go for rolling upgrade on a live cluster from 1.1.10 version to 1.2.4 ve= rsion ? Regards, Ananth --_000_CDF352BD12371agundabattulathreatmetrixcom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hello Everybod= y,

We were performing an upgrade of ou= r cluster from 1.1.10 version to 1.2.4 . We tested the upgrade process in a= QA environment and found no issues. However in the production node, we fac= ed loads of errors and had to abort the upgrade process. 
I was wondering how we ran into such a situation. The main dif= ference between the QA environment and the production environments is the R= eplication Factor. In QA , RF=3D1 and in production RF=3D3. 

Example stack traces are  as seen on the other nodes a= re : http://pastebin.com/fSnM= Ad8q

The other observation is that the node wh= ich was being upgraded is a seed node in the 1.1.10. We aborted right after= the first node gave the above issues. Does this mean that there will be an= application downtime required if we go for rolling upgrade on a live clust= er from 1.1.10 version to 1.2.4 version ?  

R= egards,
Ananth




--_000_CDF352BD12371agundabattulathreatmetrixcom_--