cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McGuire <r...@datastax.com>
Subject Re: 3.0 and the Cassandra release process
Date Fri, 20 Mar 2015 14:57:46 GMT
I'm taking notes from the infrastructure doc and wrote down some action
items for my team:

https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976


--

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan McGuire

Software Engineering Manager in Test | ryan@datastax.com

[image: linkedin.png] <https://www.linkedin.com/in/enigmacurry> [image:
twitter.png] <http://twitter.com/enigmacurry>
<http://github.com/enigmacurry>


On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg <ariel.weisberg@datastax.com
> wrote:

> Hi,
>
> I realized one of the documents we didn't send out was the infrastructure
> side changes I am looking for. This one is maybe a little rougher as it was
> the first one I wrote on the subject.
>
>
> https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing
>
> The goal is to have infrastructure that gives developers as close to
> immediate feedback as possible on their code before they merge. Feedback
> that is delayed to after merging to trunk should come in a day or two and
> there is a product owner (Michael Shuler) responsible for making sure that
> issues are addressed quickly.
>
> QA is going to help by providing developers with a better tools for writing
> higher level functional tests that explore all of the functions together
> along with the configuration space without developers having to do any work
> other then plugging in functionality to exercise and then validate
> something specific. This kind of harness is hard to get right and make
> reliable and expressive so they have their work cut out for them.
>
> It's going to be an iterative process where the tests improve as new work
> introduces missing coverage and as bugs/regressions drive the introduction
> of new tests. The monthly retrospective (planning on doing that first of
> the month) is also going to help us refine the testing and development
> process.
>
> Ariel
>
> On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown <jasedbrown@gmail.com> wrote:
>
> > +1 to this general proposal. I think the time has finally come for us to
> > try something new, and this sounds legit. Thanks!
> >
> > On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud1937@gmail.com> wrote:
> >
> > > Can I regard the odd version as the "development preview" and the even
> > > version as the "production ready"?
> > >
> > > IMO, as a database infrastructure project, "stable" is more important
> > than
> > > other kinds of projects. LTS is a good idea, but if we don't support
> > > non-LTS releases for enough time to fix their bugs, users on non-LTS
> > > release may have to upgrade a new major release to fix the bugs and may
> > > have to handle some new bugs by the new features. I'm afraid that
> > > eventually people would only think about the LTS one.
> > >
> > >
> > > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <povel.y@gmail.com>:
> > >
> > > > +1
> > > >
> > > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman <
> > > > mkjellman@internalcircle.com> wrote:
> > > >
> > > > > For most of my life I’ve lived on the software bleeding edge both
> > > > > personally and professionally. Maybe it’s a personal weakness,
but
> I
> > > > guess
> > > > > I get a thrill out of the problem solving aspect?
> > > > >
> > > > > Recently I came to a bit of an epiphany — the closer I keep to
the
> > > daily
> > > > > build — generally the happier I am on a daily basis. Bugs happen,
> but
> > > for
> > > > > the most part (aside from show stopper bugs), pain points for
> myself
> > > in a
> > > > > given daily build can generally can be debugged to 1 or maybe 2
> root
> > > > > causes, fixed in ~24 hours, and then life is better the next day
> > again.
> > > > In
> > > > > comparison, the old waterfall model generally means taking an
> > > “official”
> > > > > release at some point and waiting for some poor soul (or developer)
> > to
> > > > > actually run the thing. No matter how good the QA team is, until
> it’s
> > > > > actually used in the real world, most bugs aren’t found.
> > > > >
> > > > > If you and your organization can wait 24 hours * number of bugs
> > > > discovered
> > > > > after people actually started using the thing, you end up with a
> > > “usable
> > > > > build” around the holy-grail minor X.X.5 release of Cassandra.
> > > > >
> > > > > I love the idea of the LTS model Jonathan describes because it
> means
> > > more
> > > > > code can get real testing and “bake” for longer instead of sitting
> > > > largely
> > > > > unused on some git repository in a datacenter far far away. A lot
> of
> > > code
> > > > > has changed between 2.0 and trunk today. The code has diverged to
> the
> > > > point
> > > > > that if you write something for 2.0 (as the most stable major
> branch
> > > > > currently available), merging it forward to 3.0 or after generally
> > > means
> > > > > rewriting it. If the only thing that comes out of this is a smaller
> > > delta
> > > > > of LOC between the deployable version/branch and what we can
> develop
> > > > > against and what QA is focused on I think that’s a massive win.
> > > > >
> > > > > Something like CASSANDRA-8099 will need 2x the baking time of even
> > many
> > > > of
> > > > > the more risky changes the project has made. While I wouldn’t want
> to
> > > > run a
> > > > > build with CASSANDRA-8099 in it anytime soon, there are now
> hundreds
> > of
> > > > > other changes blocked, most likely many containing new bugs of
> their
> > > own,
> > > > > but have no exposure at all to even the most involved C*
> developers.
> > > > >
> > > > > I really think this will be a huge win for the project and I’m
> super
> > > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for
> guiding
> > > this
> > > > > change to a much more sustainable release model for the entire
> > > community.
> > > > >
> > > > > best,
> > > > > kjellman
> > > > >
> > > > >
> > > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg <
> > > > ariel.weisberg@datastax.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Keep in mind it is a bug fix release every month and a feature
> > > release
> > > > > every two months.
> > > > > >
> > > > > > For development that is really a two month cycle with all bug
> fixes
> > > > > being backported one release. As a developer if you want to get
> > > something
> > > > > in a release you have two months and you should be sizing pieces
of
> > > large
> > > > > tasks so they ship at least every two months.
> > > > > >
> > > > > > Ariel
> > > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <
> > tscanausa@gmail.com
> > > >
> > > > > wrote:
> > > > > >>
> > > > > >> I like the idea but I agree that every month is a bit
> aggressive.
> > I
> > > > > have no
> > > > > >> say but:
> > > > > >>
> > > > > >> I would say 4 releases a year instead of 12. with 2 months
of
> new
> > > > > features
> > > > > >> and 1 month of bug squashing per a release. With the 4th
quarter
> > > just
> > > > > bugs.
> > > > > >>
> > > > > >> I would also proposed 2 year LTS releases for the releases
after
> > the
> > > > 4th
> > > > > >> quarter. So everyone could get a new feature release every
> quarter
> > > and
> > > > > the
> > > > > >> stability of super major versions for 2 years.
> > > > > >>
> > > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <
> > > > dbrosius@mebigfatguy.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It would seem the practical implications of this is
that there
> > > would
> > > > be
> > > > > >>> significantly more development on branches, with potentially
> more
> > > > > >>> significant delays on merging these branches. This would
imply
> to
> > > me
> > > > > that
> > > > > >>> more Jenkins servers would need to be set up to handle
> > auto-testing
> > > > of
> > > > > more
> > > > > >>> branches, as if feature work spends more time on external
> > branches,
> > > > it
> > > > > is
> > > > > >>> then likely to be be less tested (even if by accident)
as less
> > > > > developers
> > > > > >>> would be working on that branch. Only when a feature
was
> blessed
> > to
> > > > > make it
> > > > > >>> to the release-tracked branch, would it become exposed
to the
> > > > majority
> > > > > of
> > > > > >>> developers/testers, etc doing normal running/playing/testing.
> > > > > >>>
> > > > > >>> This isn't to knock the idea in anyway, just wanted
to mention
> > > what i
> > > > > >>> think the outcome would be.
> > > > > >>>
> > > > > >>> dave
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan
Ellis <
> > > > jbellis@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>>> Cassandra 2.1 was released in September,
which means that
> if
> > we
> > > > > were
> > > > > >>>>> on
> > > > > >>>>>>> track with our stated goal of six month
releases, 3.0 would
> > be
> > > > done
> > > > > >>>>> about
> > > > > >>>>>>> now.  Instead, we haven't even delivered
a beta.  The
> > immediate
> > > > > cause
> > > > > >>>>>> this
> > > > > >>>>>>> time is blocking for 8099
> > > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>,
> but
> > > the
> > > > > >>>>> reality
> > > > > >>>>>> is
> > > > > >>>>>>> that nobody should really be surprised.
 Something always
> > comes
> > > > up
> > > > > --
> > > > > >>>>>> we've
> > > > > >>>>>>> averaged about nine months since 1.0,
with 2.1 taking an
> > entire
> > > > > year.
> > > > > >>>>>>>
> > > > > >>>>>>> We could make theory align with reality
by acknowledging,
> "if
> > > > nine
> > > > > >>>>> months
> > > > > >>>>>>> is our 'natural' release schedule, then
so be it."  But I
> > think
> > > > we
> > > > > >>>>> can
> > > > > >>>>> do
> > > > > >>>>>>> better.
> > > > > >>>>>>>
> > > > > >>>>>>> Broadly speaking, we have two constituencies
with Cassandra
> > > > > releases:
> > > > > >>>>>>>
> > > > > >>>>>>> First, we have the users who are building
or porting an
> > > > application
> > > > > >>>>> on
> > > > > >>>>>>> Cassandra.  These users want the newest
features to make
> > their
> > > > job
> > > > > >>>>>> easier.
> > > > > >>>>>>> If 2.1.0 has a few bugs, it's not the
end of the world.
> They
> > > > have
> > > > > >>>>> time
> > > > > >>>>>> to
> > > > > >>>>>>> wait for 2.1.x to stabilize while they
write their code.
> > They
> > > > > would
> > > > > >>>>> like
> > > > > >>>>>>> to see us deliver on our six month schedule
or even faster.
> > > > > >>>>>>>
> > > > > >>>>>>> Second, we have the users who have an
application in
> > > production.
> > > > > >>>>> These
> > > > > >>>>>>> users, or their bosses, want Cassandra
to be as stable as
> > > > possible.
> > > > > >>>>>>> Assuming they deploy on a stable release
like 2.0.12, they
> > > don't
> > > > > want
> > > > > >>>>> to
> > > > > >>>>>>> touch it.  They would like to see us
release *less* often.
> > > > > (Because
> > > > > >>>>> that
> > > > > >>>>>>> means they have to do less upgrades
while remaining in our
> > > > > backwards
> > > > > >>>>>>> compatibility window.)
> > > > > >>>>>>>
> > > > > >>>>>>> With our current "big release every
X months" model, these
> > > users'
> > > > > >>>>> needs
> > > > > >>>>>> are
> > > > > >>>>>>> in tension.
> > > > > >>>>>>>
> > > > > >>>>>>> We discussed this six months ago, and
ended up with this:
> > > > > >>>>>>>
> > > > > >>>>>>> What if we tried a [four month] release
cycle, BUT we would
> > > > > guarantee
> > > > > >>>>>> that
> > > > > >>>>>>>> you could do a rolling upgrade until
we bump the
> supermajor
> > > > > version?
> > > > > >>>>> So
> > > > > >>>>>> 2.0
> > > > > >>>>>>>> could upgrade to 3.0 without having
to go through 2.1.
> (But
> > > to
> > > > go
> > > > > >>>>> to
> > > > > >>>>>> 3.1
> > > > > >>>>>>>> or 4.0 you would have to go through
3.0.)
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Crucially, I added
> > > > > >>>>>>>
> > > > > >>>>>>> Whether this is reasonable depends on
how fast we can
> > stabilize
> > > > > >>>>> releases.
> > > > > >>>>>>>> 2.1.0 will be a good test of this.
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Unfortunately, even after DataStax hired
half a dozen
> > full-time
> > > > > test
> > > > > >>>>>>> engineers, 2.1.0 continued the proud
tradition of being
> > unready
> > > > for
> > > > > >>>>>>> production use, with "wait for .5 before
upgrading" once
> > again
> > > > > >>>>> looking
> > > > > >>>>>> like
> > > > > >>>>>>> a good guideline.
> > > > > >>>>>>>
> > > > > >>>>>>> I’m starting to think that the entire
model of “write a
> bunch
> > > of
> > > > > new
> > > > > >>>>>>> features all at once and then try to
stabilize it for
> > release”
> > > is
> > > > > >>>>> broken.
> > > > > >>>>>>> We’ve been trying that for years and
empirically speaking
> the
> > > > > >>>>> evidence
> > > > > >>>>> is
> > > > > >>>>>>> that it just doesn’t work, either
from a stability
> standpoint
> > > or
> > > > > even
> > > > > >>>>>> just
> > > > > >>>>>>> shipping on time.
> > > > > >>>>>>>
> > > > > >>>>>>> A big reason that it takes us so long
to stabilize new
> > releases
> > > > now
> > > > > >>>>> is
> > > > > >>>>>>> that, because our major release cycle
is so long, it’s
> super
> > > > > tempting
> > > > > >>>>> to
> > > > > >>>>>>> slip in “just one” new feature into
bugfix releases, and
> I’m
> > as
> > > > > >>>>> guilty
> > > > > >>>>> of
> > > > > >>>>>>> that as anyone.
> > > > > >>>>>>>
> > > > > >>>>>>> For similar reasons, it’s difficult
to do a meaningful
> freeze
> > > > with
> > > > > >>>>> big
> > > > > >>>>>>> feature releases.  A look at 3.0 shows
why: we have 8099
> > > coming,
> > > > > but
> > > > > >>>>> we
> > > > > >>>>>>> also have significant work done (but
not finished) on 6230,
> > > 7970,
> > > > > >>>>> 6696,
> > > > > >>>>>> and
> > > > > >>>>>>> 6477, all of which are meaningful improvements
that address
> > > > > >>>>> demonstrated
> > > > > >>>>>>> user pain.  So if we keep doing what
we’ve been doing, our
> > > > choices
> > > > > >>>>> are
> > > > > >>>>> to
> > > > > >>>>>>> either delay 3.0 further while we finish
and stabilize
> these,
> > > or
> > > > we
> > > > > >>>>> wait
> > > > > >>>>>>> nine months to a year for the next release.
 Either way,
> one
> > of
> > > > our
> > > > > >>>>>>> constituencies gets disappointed.
> > > > > >>>>>>>
> > > > > >>>>>>> So, I’d like to try something different.
 I think we were
> on
> > > the
> > > > > >>>>> right
> > > > > >>>>>>> track with shorter releases with more
compatibility.  But
> I’d
> > > > like
> > > > > to
> > > > > >>>>>> throw
> > > > > >>>>>>> in a twist.  Intel cuts down on risk
with a “tick-tock”
> > > schedule
> > > > > for
> > > > > >>>>> new
> > > > > >>>>>>> architectures and process shrinks instead
of trying to do
> > both
> > > at
> > > > > >>>>> once.
> > > > > >>>>>> We
> > > > > >>>>>>> can do something similar here:
> > > > > >>>>>>>
> > > > > >>>>>>> One month releases.  Period.  If it’s
not done, it can
> wait.
> > > > > >>>>>>> *Every other release only accepts bug
fixes.*
> > > > > >>>>>>>
> > > > > >>>>>>> By itself, one-month releases are going
to dramatically
> > reduce
> > > > the
> > > > > >>>>>>> complexity of testing and debugging
new releases -- and
> bugs
> > > that
> > > > > do
> > > > > >>>>> slip
> > > > > >>>>>>> past us will only affect a smaller percentage
of users,
> > > avoiding
> > > > > the
> > > > > >>>>> “big
> > > > > >>>>>>> release has a bunch of bugs no one has
seen before and
> pretty
> > > > much
> > > > > >>>>>> everyone
> > > > > >>>>>>> is hit by something” scenario.  But
by adding in the second
> > > > rule, I
> > > > > >>>>> think
> > > > > >>>>>>> we have a real chance to make a quantum
leap here: stable,
> > > > > >>>>>> production-ready
> > > > > >>>>>>> releases every two months.
> > > > > >>>>>>>
> > > > > >>>>>>> So here is my proposal for 3.0:
> > > > > >>>>>>>
> > > > > >>>>>>> We’re just about ready to start serious
review of 8099.
> When
> > > > > that’s
> > > > > >>>>>> done,
> > > > > >>>>>>> we branch 3.0 and cut a beta and then
release candidates.
> > > > Whatever
> > > > > >>>>> isn’t
> > > > > >>>>>>> done by then, has to wait; unlike prior
betas, we will only
> > > > accept
> > > > > >>>>> bug
> > > > > >>>>>>> fixes into 3.0 after branching.
> > > > > >>>>>>>
> > > > > >>>>>>> One month after 3.0, we will ship 3.1
(with new features).
> > At
> > > > the
> > > > > >>>>> same
> > > > > >>>>>>> time, we will branch 3.2.  New features
in trunk will go
> into
> > > > 3.3.
> > > > > >>>>> The
> > > > > >>>>>> 3.2
> > > > > >>>>>>> branch will only get bug fixes.  We
will maintain backwards
> > > > > >>>>> compatibility
> > > > > >>>>>>> for all of 3.x; eventually (no less
than a year) we will
> > pick a
> > > > > >>>>> release
> > > > > >>>>>> to
> > > > > >>>>>>> be 4.0, and drop deprecated features
and old backwards
> > > > > >>>>> compatibilities.
> > > > > >>>>>>> Otherwise there will be nothing special
about the 4.0
> > > > designation.
> > > > > >>>>> (Note
> > > > > >>>>>>> that with an “odd releases have new
features, even releases
> > > only
> > > > > have
> > > > > >>>>> bug
> > > > > >>>>>>> fixes” policy, 4.0 will actually be
*more* stable than
> 3.11.)
> > > > > >>>>>>>
> > > > > >>>>>>> Larger features can continue to be developed
in separate
> > > > branches,
> > > > > >>>>> the
> > > > > >>>>>> way
> > > > > >>>>>>> 8099 is being worked on today, and committed
to trunk when
> > > ready.
> > > > > So
> > > > > >>>>>> this
> > > > > >>>>>>> is not saying that we are limited only
to features we can
> > build
> > > > in
> > > > > a
> > > > > >>>>>> single
> > > > > >>>>>>> month.
> > > > > >>>>>>>
> > > > > >>>>>>> Some things will have to change with
our dev process, for
> the
> > > > > better.
> > > > > >>>>> In
> > > > > >>>>>>> particular, with one month to commit
new features, we don’t
> > > have
> > > > > room
> > > > > >>>>> for
> > > > > >>>>>>> committing sloppy work and stabilizing
it later.  Trunk has
> > to
> > > be
> > > > > >>>>> stable
> > > > > >>>>>> at
> > > > > >>>>>>> all times.  I asked Ariel Weisberg to
put together his
> > thoughts
> > > > > >>>>>> separately
> > > > > >>>>>>> on what worked for his team at VoltDB,
and how we can apply
> > > that
> > > > to
> > > > > >>>>>>> Cassandra -- see his email from Friday
<
> > http://bit.ly/1MHaOKX
> > > >.
> > > > > >>>>> (TLDR:
> > > > > >>>>>>> Redefine “done” to include automated
tests.  Infrastructure
> > to
> > > > run
> > > > > >>>>> tests
> > > > > >>>>>>> against github branches before merging
to trunk.  A new
> test
> > > > > harness
> > > > > >>>>> for
> > > > > >>>>>>> long-running regression tests.)
> > > > > >>>>>>>
> > > > > >>>>>>> I’m optimistic that as we improve
our process this way, our
> > > even
> > > > > >>>>> releases
> > > > > >>>>>>> will become increasingly stable.  If
so, we can skip
> > sub-minor
> > > > > >>>>> releases
> > > > > >>>>>>> (3.2.x) entirely, and focus on keeping
the release train
> > > moving.
> > > > > In
> > > > > >>>>> the
> > > > > >>>>>>> meantime, we will continue delivering
2.1.x stability
> > releases.
> > > > > >>>>>>>
> > > > > >>>>>>> This won’t be an entirely smooth transition.
 In
> particular,
> > > you
> > > > > will
> > > > > >>>>>> have
> > > > > >>>>>>> noticed that 3.1 will get more than
a month’s worth of new
> > > > features
> > > > > >>>>> while
> > > > > >>>>>>> we stabilize 3.0 as the last of the
old way of doing
> things,
> > so
> > > > > some
> > > > > >>>>>>> patience is in order as we try this
out.  By 3.4 and 3.6
> > later
> > > > this
> > > > > >>>>> year
> > > > > >>>>>> we
> > > > > >>>>>>> should have a good idea if this is working,
and we can make
> > > > > >>>>> adjustments
> > > > > >>>>>> as
> > > > > >>>>>>> warranted.
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Jonathan Ellis
> > > > > >>>>>>> Project Chair, Apache Cassandra
> > > > > >>>>>>> co-founder, http://www.datastax.com
> > > > > >>>>>>> @spyced
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Phil Yang
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message