cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Proposal - 3.5.1
Date Thu, 15 Sep 2016 15:18:47 GMT
Where did we come from?

We came from a place where we would say, "You probably do not want to run
2.0.X until it reaches 2.0.6"

One thing about Cassandra is we get into a situation where we can only go
forward. For example, when you update from version X to version Y, version
Y might start writing a new versions of sstables.

X - sstables-v1
Y - sstables-v2

This is very scary operations side because you can not bring the the system
back to running version X as Y data is unreadable.

Where are we at now?

We now seem to be in a place where you say "Problem in 3.5 (trunk at a
given day)?,  go to 3.9 (trunk at last tt- release) "

http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/

"To get there, we are investing significant effort in making trunk “always
releasable,” with the goal that each release, or at least each odd-numbered
bugfix release, should be usable in production. "

I support releasable trunk, but the qualifying statement "or at least each
odd number release" undoes the assertion of "always releasable". Not trying
to nit pick here. I realize it may be hard to get to the desired state of
releasable trunk in a short time.

Anecdotally I notice a lot of "movement" in class names/names of functions.
Generally, I can look at a stack trace of a piece of software and I can
bring up the line number in github and it is dead on, or fairly close to
the line of code. Recently I have tried this in versions fairly close
together and seen some drastic changes.

We know some things i personally do not like:
1) lack of stable-ish api's in the codebase
2) use of singletons rather than simple dependency injection (like even
constructor based injection)

IMHO these do not fit well with 'release often' and always produce 'high
quality release'.

I do not love the concept of 'bug fix release' I would not mind waiting
longer for a feature as long as I could have a high trust factor in in
working right the first time.

Take a feature like trickle_fs, By the description it sounds like a clear
optimization win. It is off by default. The description says "turn on for
ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd.
Are we tuning for ssd by default or not?

By being false, it is not tested in wild, how is it covered and trusted
during tests, how many tests have it off vs on?

I think the concept that trickle_fs can be added as a feature, set false
and possibly gains real world coverage is not comforting to me. I do not
want to turn it on and get some weird issue because no one else is running
this. I would rather it be added on by default with extreme confidence or
not added at all.



On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> In this particular case, I'd say adding a bug fix release for every version
> that's affected would be the right thing.  The issue is so easily
> reproducible and will likely result in massive data loss for anyone on 3.X
> WHERE X < 6 and uses the "date" type.
>
> This is how easy it is to reproduce:
>
> 1. Start Cassandra 3.5
> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> 3. use test;
> 4. create table fail (id int primary key, d date);
> 5. delete d from fail where id = 1;
> 6. Stop Cassandra
> 7. Start Cassandra
>
> You will get this, and startup will fail:
>
> ERROR 05:32:09 Exiting due to error while processing commit log during
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4r0000gn/T/
> mutation6313332720566971713dat.
> This may be caused by replaying a mutation against a table with the same
> name but incompatible schema.  Exception follows:
> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long
> for
> date (0)
>
> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> probably the other releases) and requires very little investment from
> anyone.
>
>
> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
> > We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
> but
> > we certainly didn’t/won’t go back and cut new releases from every branch
> > for every critical bug in future releases, so I think we need to draw the
> > line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems like
> > you’ve got options (either stay on the tick and go up to 3.7, or bail
> down
> > to 3.0.x)
> >
> > Perhaps, though, this highlights the fact that tick/tock may not be the
> > best option long term. We’ve tried it for a year, perhaps we should
> instead
> > discuss whether or not it should continue, or if there’s another process
> > that gives us a better way to get useful patches into versions people are
> > willing to run in production.
> >
> >
> >
> > On 9/14/16, 8:55 PM, "Jonathan Haddad" <jon@jonhaddad.com> wrote:
> >
> > >Common sense is what prevents someone from upgrading to yet another
> > >completely unknown version with new features which have probably broken
> > >even more stuff that nobody is aware of.  The folks I'm helping right
> > >deployed 3.5 when they got started because
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__
> cassandra.apache.org&d=DQIBaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kq
> hAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=
> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=pLP3udocOcAG6k_
> sAb9p8tcAhtOhpFm6JB7owGhPQEs&e=
> > suggests
> > >it's acceptable for production.  It turns out using 4 of the built in
> > >datatypes of the database result in the server being unable to restart
> > >without clearing out the commit logs and running a repair.  That screams
> > >critical to me.  You shouldn't even be able to install 3.5 without the
> > >patch I've supplied - that bug is a ticking time bomb for anyone that
> > >installs it.
> > >
> > >On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler <michael@pbandjelly.org>
> > >wrote:
> > >
> > >> What's preventing the use of the 3.6 or 3.7 releases where this bug is
> > >> already fixed? This is also fixed in the 3.0.6/7/8 releases.
> > >>
> > >> Michael
> > >>
> > >> On 09/14/2016 08:30 PM, Jonathan Haddad wrote:
> > >> > Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back
> > ported to
> > >> > 3.5 as well, and it makes Cassandra effectively unusable if someone
> is
> > >> > using any of the 4 types affected in any of their schema.
> > >> >
> > >> > I have cherry picked & merged the patch back to here and will
put it
> > in a
> > >> > JIRA as well tonight, I just wanted to get the ball rolling asap on
> > this.
> > >> >
> > >> >
> > >>
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog-5Fexception&d=DQIBaQ&c=
> 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=
> yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=
> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=ktY5tkT-
> nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII&e=
> > >> >
> > >> > Jon
> > >> >
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message