Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 72CF6200B85 for ; Thu, 15 Sep 2016 17:18:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7144F160AC6; Thu, 15 Sep 2016 15:18:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8CEA9160ABA for ; Thu, 15 Sep 2016 17:18:57 +0200 (CEST) Received: (qmail 80452 invoked by uid 500); 15 Sep 2016 15:18:56 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 80436 invoked by uid 99); 15 Sep 2016 15:18:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2016 15:18:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C0E57185EA7 for ; Thu, 15 Sep 2016 15:18:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id OzsWpVJ_yJwD for ; Thu, 15 Sep 2016 15:18:52 +0000 (UTC) Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com [209.85.215.41]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 9009560D25 for ; Thu, 15 Sep 2016 15:18:51 +0000 (UTC) Received: by mail-lf0-f41.google.com with SMTP id g62so39354975lfe.3 for ; Thu, 15 Sep 2016 08:18:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Sa1j8sKvhbY8ocK4xo0FCcWx1dMnkMEGKeFGGN0xnRI=; b=SXSqsNZUMoM32SnIlOp6Uf5+m3UD+bcOh1iC+mHSkaDC4QZm3gCurUxvHgM+ozz3mU fI1Od77BOK0oAbcTqfaHvkhSS8gP16Lcf1e1+Fv+C3Tyl2AlzK1PFy4TnQxjaBXgwP66 uoq4NZOTwPLRR0aBUDRgli7PvPMXTIN9eAJxFtlREShgj1jpK/crd76MedLA/iu7+8I5 Jy26H4f4HB2HEWE512AH9ZXr/Z9eobYif3eSUVvoyE7ns2H1h6+aVg37X5PBrqpVpzJ2 nQe8aRZiDW4HQ4A1d/aifkNewcsq0WWxVX1IISTwekedv0Kd9t4R6NqDT9xcsOzPwGVN +0AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Sa1j8sKvhbY8ocK4xo0FCcWx1dMnkMEGKeFGGN0xnRI=; b=gUqspDT8cGxuv26ielJUT9L4fl11RiOm9aqKVsXCRYryTZMOwh+wB1S7SzPRnL1Nyu TAdHXADnPut5PCEpycltFtYxq4kYMqp9bRc00THSg1hlpKZEJ/aaWg4sm4sVnumOieKS aNTH+OmW2lQQ9klfIRTdtUxFNQyKaU55EeuUDmQXis8dTEtgJUNweJy/wAiTocJJ+g9p EbNDhz3Rcc4rU4weRHQuEcskCpH+bFA9rGRuwJ2GOKQzvlKIFH8LcV3jKL76j504Qqjq 6PRkfHLTDVsM6DWJoUMUO6idq19a9qdgHh9+S7li0SiHHMb0Jb3J42SN83w58P7vUwp2 PtlQ== X-Gm-Message-State: AE9vXwPccaKsBl7KeegzhlBFcqtSod3yJZEpLv8x3U3cgKa1QSyVj6V0f29saxwGCXilBlsVWRwCQNJP3ZHyeQ== X-Received: by 10.25.158.66 with SMTP id h63mr3452008lfe.155.1473952727973; Thu, 15 Sep 2016 08:18:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.146.212 with HTTP; Thu, 15 Sep 2016 08:18:47 -0700 (PDT) In-Reply-To: References: <26E076F4-93DF-4F9B-B379-B79A12B9D6B1@crowdstrike.com> From: Edward Capriolo Date: Thu, 15 Sep 2016 11:18:47 -0400 Message-ID: Subject: Re: Proposal - 3.5.1 To: "dev@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11401918c44288053c8d5ce3 archived-at: Thu, 15 Sep 2016 15:18:58 -0000 --001a11401918c44288053c8d5ce3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Where did we come from? We came from a place where we would say, "You probably do not want to run 2.0.X until it reaches 2.0.6" One thing about Cassandra is we get into a situation where we can only go forward. For example, when you update from version X to version Y, version Y might start writing a new versions of sstables. X - sstables-v1 Y - sstables-v2 This is very scary operations side because you can not bring the the system back to running version X as Y data is unreadable. Where are we at now? We now seem to be in a place where you say "Problem in 3.5 (trunk at a given day)?, go to 3.9 (trunk at last tt- release) " http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ "To get there, we are investing significant effort in making trunk =E2=80= =9Calways releasable,=E2=80=9D with the goal that each release, or at least each odd-= numbered bugfix release, should be usable in production. " I support releasable trunk, but the qualifying statement "or at least each odd number release" undoes the assertion of "always releasable". Not trying to nit pick here. I realize it may be hard to get to the desired state of releasable trunk in a short time. Anecdotally I notice a lot of "movement" in class names/names of functions. Generally, I can look at a stack trace of a piece of software and I can bring up the line number in github and it is dead on, or fairly close to the line of code. Recently I have tried this in versions fairly close together and seen some drastic changes. We know some things i personally do not like: 1) lack of stable-ish api's in the codebase 2) use of singletons rather than simple dependency injection (like even constructor based injection) IMHO these do not fit well with 'release often' and always produce 'high quality release'. I do not love the concept of 'bug fix release' I would not mind waiting longer for a feature as long as I could have a high trust factor in in working right the first time. Take a feature like trickle_fs, By the description it sounds like a clear optimization win. It is off by default. The description says "turn on for ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd. Are we tuning for ssd by default or not? By being false, it is not tested in wild, how is it covered and trusted during tests, how many tests have it off vs on? I think the concept that trickle_fs can be added as a feature, set false and possibly gains real world coverage is not comforting to me. I do not want to turn it on and get some weird issue because no one else is running this. I would rather it be added on by default with extreme confidence or not added at all. On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad wrote: > In this particular case, I'd say adding a bug fix release for every versi= on > that's affected would be the right thing. The issue is so easily > reproducible and will likely result in massive data loss for anyone on 3.= X > WHERE X < 6 and uses the "date" type. > > This is how easy it is to reproduce: > > 1. Start Cassandra 3.5 > 2. create KEYSPACE test WITH replication =3D {'class': 'SimpleStrategy', > 'replication_factor': 1}; > 3. use test; > 4. create table fail (id int primary key, d date); > 5. delete d from fail where id =3D 1; > 6. Stop Cassandra > 7. Start Cassandra > > You will get this, and startup will fail: > > ERROR 05:32:09 Exiting due to error while processing commit log during > initialization. > org.apache.cassandra.db.commitlog.CommitLogReplayer$ > CommitLogReplayException: > Unexpected error deserializing mutation; saved to > /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4r0000gn/T/ > mutation6313332720566971713dat. > This may be caused by replaying a mutation against a table with the same > name but incompatible schema. Exception follows: > org.apache.cassandra.serializers.MarshalException: Expected 4 byte long > for > date (0) > > I mean.. come on. It's an easy fix. It cleanly merges against 3.5 (and > probably the other releases) and requires very little investment from > anyone. > > > On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa > wrote: > > > We did 3.1.1 and 3.2.1, so there=E2=80=99s SOME precedent for emergency= fixes, > but > > we certainly didn=E2=80=99t/won=E2=80=99t go back and cut new releases = from every branch > > for every critical bug in future releases, so I think we need to draw t= he > > line somewhere. If it=E2=80=99s fixed in 3.7 and 3.0.x (x >=3D 6), it s= eems like > > you=E2=80=99ve got options (either stay on the tick and go up to 3.7, o= r bail > down > > to 3.0.x) > > > > Perhaps, though, this highlights the fact that tick/tock may not be the > > best option long term. We=E2=80=99ve tried it for a year, perhaps we sh= ould > instead > > discuss whether or not it should continue, or if there=E2=80=99s anothe= r process > > that gives us a better way to get useful patches into versions people a= re > > willing to run in production. > > > > > > > > On 9/14/16, 8:55 PM, "Jonathan Haddad" wrote: > > > > >Common sense is what prevents someone from upgrading to yet another > > >completely unknown version with new features which have probably broke= n > > >even more stuff that nobody is aware of. The folks I'm helping right > > >deployed 3.5 when they got started because > > https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3A__ > cassandra.apache.org&d=3DDQIBaQ&c=3D08AGY6txKsvMOP6lYkHQpPMRA1U6kq > hAwGa8-0QCg3M&r=3DyfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=3D > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=3DpLP3udocOcAG6k_ > sAb9p8tcAhtOhpFm6JB7owGhPQEs&e=3D > > suggests > > >it's acceptable for production. It turns out using 4 of the built in > > >datatypes of the database result in the server being unable to restart > > >without clearing out the commit logs and running a repair. That screa= ms > > >critical to me. You shouldn't even be able to install 3.5 without the > > >patch I've supplied - that bug is a ticking time bomb for anyone that > > >installs it. > > > > > >On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler > > >wrote: > > > > > >> What's preventing the use of the 3.6 or 3.7 releases where this bug = is > > >> already fixed? This is also fixed in the 3.0.6/7/8 releases. > > >> > > >> Michael > > >> > > >> On 09/14/2016 08:30 PM, Jonathan Haddad wrote: > > >> > Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back > > ported to > > >> > 3.5 as well, and it makes Cassandra effectively unusable if someon= e > is > > >> > using any of the 4 types affected in any of their schema. > > >> > > > >> > I have cherry picked & merged the patch back to here and will put = it > > in a > > >> > JIRA as well tonight, I just wanted to get the ball rolling asap o= n > > this. > > >> > > > >> > > > >> > > https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__github. > com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog-5Fexception&d=3DDQIBaQ= &c=3D > 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=3D > yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=3D > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=3DktY5tkT- > nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII&e=3D > > >> > > > >> > Jon > > >> > > > >> > > >> > > > --001a11401918c44288053c8d5ce3--