aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ASF IRC Bot <>
Subject Summary of IRC Meeting in #aurora
Date Mon, 01 Feb 2016 19:33:20 GMT
Summary of IRC Meeting in #aurora at Mon Feb  1 19:06:57 2016:

Attendees: mkhutornenko, adeshmukh, zmanji, benley, jcohen

- Preface
- Deprecation cycles
  - Action: jcohen to follow up w/ dev thread re: changing deprecation policy.
- AURORA-1603
- Rollback testing
  - Action: jcohen to email dev@ w.r.t. rollback testing.

IRC log follows:

## Preface ##
[Mon Feb  1 19:07:49 2016] <jcohen>: Ok, let’s start w/ roll call, as always everyone
is encouraged to parctipate!
[Mon Feb  1 19:07:53 2016] <jcohen>: here :)
[Mon Feb  1 19:07:57 2016] <benley>: Here
[Mon Feb  1 19:07:59 2016] <adeshmukh>: here
[Mon Feb  1 19:08:43 2016] <mkhutornenko>: here
[Mon Feb  1 19:08:54 2016] <zmanji>: here
[Mon Feb  1 19:10:25 2016] <jcohen>: Ok, first things first…
## Deprecation cycles ##
[Mon Feb  1 19:11:08 2016] <jcohen>: As we increase the cadence of releases, our policy
of killing deprecated fields after one release cycle becomes more burdensome.
[Mon Feb  1 19:11:54 2016] <jcohen>: Given that we’re trying to at least keep up with
Mesos’s release cycle which is now timed, it seems like this will be a continuing problem
for us, since we can expect releases fairly regularly.
[Mon Feb  1 19:12:21 2016] <jcohen>: Curious what people think about moving from a release-based
deprecation to a timed deprecation
[Mon Feb  1 19:12:28 2016] <benley>: I'd be in favor.
[Mon Feb  1 19:12:50 2016] <jcohen>: (i.e. instead of deprecated in release X, removed
in release X + 1, instead it would be removed N days after the release in which it was deprecated)
[Mon Feb  1 19:13:14 2016] <zmanji>: I'm also in favor of time based because I like
the frequent releases but some of the deprecations are pretty difficult to do
[Mon Feb  1 19:13:26 2016] <benley>: Or perhaps "2 releases, or at least NN days"
[Mon Feb  1 19:14:20 2016] <mkhutornenko>: +1 to a timed approach. I think Mesos follows
the same practice
[Mon Feb  1 19:14:23 2016] <jcohen>: Yeah, I want to ensure we keep a balance between
giving operators enough time to adopt changes to deprecated fields versus us having to keep
them around for too long.
[Mon Feb  1 19:15:19 2016] <jcohen>: It seems all are in favor. Given the absence of
wfarner, jsirois, would it make sense to continue this discussion on the dev list where we
can come up with a final, revised policy?
[Mon Feb  1 19:16:07 2016] <jcohen>: #action jcohen to follow up w/ dev thread re: changing
deprecation policy.
## AURORA-1603 ##
[Mon Feb  1 19:16:35 2016] <jcohen>:
[Mon Feb  1 19:16:40 2016] <jcohen>: AURORA-1603
[Mon Feb  1 19:16:55 2016] <jcohen>: mkhutornenko: you want to walk through what happened
[Mon Feb  1 19:17:55 2016] <mkhutornenko>: The details of the root cause are too intricate
to follow along here but I can give a brief overview of what happened
[Mon Feb  1 19:18:39 2016] <mkhutornenko>: we tried to deploy a master version into
one of our clusters and immediately noticed an issue with duplicate instances showing up in
job page:
[Mon Feb  1 19:19:10 2016] <mkhutornenko>: we immediately attempted to rollback to a
previous known good version but the scheduler was unable to restart
[Mon Feb  1 19:19:47 2016] <mkhutornenko>: we have found stack trace (listed in
and had to restore scheduler from backup
[Mon Feb  1 19:20:17 2016] <mkhutornenko>: that led to a few other issues found in our
recovery instructions not being updated with recent changes
[Mon Feb  1 19:20:35 2016] <mkhutornenko>:
[Mon Feb  1 19:21:06 2016] <mkhutornenko>: all in all, we were able to recover but it
took us a few hours to reconcile this problem
[Mon Feb  1 19:22:44 2016] <jcohen>: Thanks Maxim. This dovetails nicely to my next
## Rollback testing ##
[Mon Feb  1 19:23:14 2016] <mkhutornenko>: btw, master is not in a working state currently,
so I wouldn’t recommend deploying from it
[Mon Feb  1 19:23:33 2016] <jcohen>: Do folks think it would be beneficial to come up
with some sort of test suite that ensures it’s possible to roll back between commits?
[Mon Feb  1 19:23:53 2016] <jcohen>: I don’t know how many people deploy from master
as opposed to from releases
[Mon Feb  1 19:24:10 2016] <jcohen>: Obviously it’s not a problem that comes up frequently,
but it can lead to serious issues when it does arise
[Mon Feb  1 19:24:32 2016] <mkhutornenko>: I think build-to-build rollback verification
is important and would benefit overall quality
[Mon Feb  1 19:25:16 2016] <jcohen>: Our jenkins job does not currently run e2e tests
[Mon Feb  1 19:25:47 2016] <jcohen>: if it did, it seems like the easiest thing to do
would be to run e2e tests, then git checkout HEAD^ and try to rebuild/restart the scheduler
[Mon Feb  1 19:26:35 2016] <mkhutornenko>: we are planning to alter our internal deploy
sequence to verify build-to-build upgrade/rollback cycle in a test cluster but would be nice
to have a solution everyone could benefit from
[Mon Feb  1 19:27:32 2016] <jcohen>: It might be worth reviving AURORA-476
[Mon Feb  1 19:27:36 2016] <jcohen>: AURORA-476
[Mon Feb  1 19:28:24 2016] <jcohen>: Again, I’ll redirect this to the dev list for
further discussion.
[Mon Feb  1 19:28:33 2016] <mkhutornenko>: +1
[Mon Feb  1 19:28:39 2016] <jcohen>: #action jcohen to email dev@ w.r.t. rollback testing.
[Mon Feb  1 19:29:04 2016] <jcohen>: That’s all I’ve got on my list, anyone else
have any topics?
[Mon Feb  1 19:30:54 2016] <jcohen>: Ok folks, that’ll do it then. Have a good week
[Mon Feb  1 19:32:53 2016] <jcohen>: ASFBot: meeting end
[Mon Feb  1 19:33:05 2016] <zmanji>: ASFBot: meeting end

Meeting ended at Mon Feb  1 19:33:05 2016

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message