Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5C129A50 for ; Wed, 30 Nov 2011 03:15:33 +0000 (UTC) Received: (qmail 49402 invoked by uid 500); 30 Nov 2011 03:15:33 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 49381 invoked by uid 500); 30 Nov 2011 03:15:32 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 49373 invoked by uid 99); 30 Nov 2011 03:15:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2011 03:15:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cryptcom@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2011 03:15:25 +0000 Received: by yenl2 with SMTP id l2so61868yen.31 for ; Tue, 29 Nov 2011 19:15:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=de7pRm6jXD7VgXjE7kwrxxlVHQyJfTBz7yRGT6fDooE=; b=bOaIJ3pp9gRGHxOk4NCTM0aWGukC1wntPXBm4ewYERM+lgK6JupgcWWtyVQhCHkGOX H15eH8YrDoE6BQARSOR01f6B8UBF3AkbD3nUyoxE8cXFo5j62o3cTKxtmBobbIigRqRo 4zgtv9XXdvj23GRukHGD99jUu7ffXSwx7xF80= Received: by 10.236.77.232 with SMTP id d68mr628884yhe.98.1322622904877; Tue, 29 Nov 2011 19:15:04 -0800 (PST) Received: from [10.67.99.59] (mobile-166-137-137-120.mycingular.net. [166.137.137.120]) by mx.google.com with ESMTPS id c10sm1071793yhj.2.2011.11.29.19.15.00 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 29 Nov 2011 19:15:04 -0800 (PST) References: In-Reply-To: Mime-Version: 1.0 (iPhone Mail 8J2) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: <901F8552-D6CB-47D8-8470-6EA4BF88634D@gmail.com> Cc: "dev@cassandra.apache.org" X-Mailer: iPhone Mail (8J2) From: Joe Stein Subject: Re: Discussion: release quality Date: Tue, 29 Nov 2011 22:14:52 -0500 To: "dev@cassandra.apache.org" I need at least a week, maybe two to promote anything to staging which is ma= inly because we do weekly releases. I could introduce a 2 day turn around b= ut only with a more fixed type schedule. I am running 0.8.6 in production a= nd REALLY want to upgrade for nothing more than getting compression ( the co= st of petabytes of uncompressed data is just stupid ). So however I can hel= p in changing my process OR better understanding the PMC here I am game for.= =20 One thing I use C* for is holding days worth of data and re-running those da= ys for regression on our software... simulating production... It might not t= ake much to reverse it. /* Joe Stein http://www.medialets.com Twitter: @allthingshadoop */ On Nov 29, 2011, at 10:04 PM, Edward Capriolo wrote:= > On Tue, Nov 29, 2011 at 6:16 PM, Jeremy Hanna = wrote: >=20 >> I'd like to start a discussion about ideas to improve release quality for= >> Cassandra. Specifically I wonder if the community can do more to help th= e >> project as a whole become more solid. Cassandra has an active and vibran= t >> community using Cassandra for a variety of things. If we all pitch in a >> little bit, it seems like we can make a difference here. >>=20 >> Release quality is difficult, especially for a distributed system like >> Cassandra. The core devs have done an amazing job with this considering >> how complicated it is. Currently, there are several things in place to >> make sure that a release is generally usable: >> - review-then-commit >> - 72 hour voting period >> - at least 3 binding +1 votes >> - unit tests >> - integration tests >> Then there is the personal responsibility aspect - testing a release in a= >> staging environment before pushing it to production. >>=20 >> I wonder if more could be done here to give more confidence in releases. >> I wanted to see if there might be ways that the community could help out >> without being too burdensome on either the core devs or the community. >>=20 >> Some ideas: >> More automation: run YCSB and stress with various setups. Maybe people >> can rotate donating cloud instances (or simply money for them) but have a= >> common set of scripts to do this in the source. >>=20 >> Dedicated distributed test suite: I know there has been work done on >> various distributed test suites (which is great!) but none have really >> caught on so far. >>=20 >> I know what the apache guidelines say, but what if the community could >> help out with the testing effort in a more formal way. For example, for >> each release to be finalized, what if there needed to be 3 community >> members that needed to try it out in their own environment? >>=20 >> What if there was a post release +1 vote for the community to sign off on= >> - sort of a "works for me" kind of thing to reassure others that it's saf= e >> to try. So when the release email gets posted to the user list, start a >> tradition of people saying +1 in reply if they've tested it out and it >> works for them. That's happening informally now when there are problems,= >> but it might be nice to see a vote of confidence. Just another idea. >>=20 >> Any other ideas or variations? >=20 >=20 > I am no software engineering guru, but whenever I +1 a hive release I > actually do checkout the code and run a couple queries. Mostly I find that= > because there is just so many things not unit testable like those gosh dar= n > bash scripts that launch Java applications. There have been times when eve= n > after multiple patch revisions and passing unit tests something just does > not work in the real world. So I never +1 a binary release I don't spend a= n > hour with and if possible I try twisting the knobs on any new feature or a= t > least just trying the basics.Hive is aiming for something like quarterly > releases. >=20 > So possibly better to have Cassandra do time based releases. It does not > have to be quarterly but if people want bleeding edge features (something > committed 2 days ago) really they should go out and build something from > trunk. >=20 > It seems like Cassandra devs have the voting and releasing down to a > science but from my world the types of bugs I worry about are data file > corruption, and any weird bug that would result in data faults like > read_repair not working or writes not going to the write nodes, or bloom > filters giving a faulty result. New features are great and I love seeing > them but I can wait for those. >=20 > Updates now even trivial ones get political, you just never want to be the= > guy that champions a update and then not have it go well :) >=20 > Most users of Cassandra are going to have large clusters and really the > project should not outstrip the common users ability to stay up to date. > You have to figure that a large cluster like 20 nodes with maybe 200Gb > data/node, doing a rolling restart without degrading performance is going > to take some time. This is more then 'yum update cassandra' > /etc/init.d/cassandra restart' and with risk of something going wrong > people need time to QA and time for ops. This type of person does not like= > to fall many releases behind and likewise can not be updating too often > either. >=20 > I have never had to roll back a release but I do wait usually for a month > before running one to make sure there is not following soon.