aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ASF IRC Bot <asf...@urd.zones.apache.org>
Subject Summary of IRC Meeting in #aurora
Date Mon, 22 Sep 2014 18:34:46 GMT
Summary of IRC Meeting in #aurora at Mon Sep 22 18:02:41 2014:

Attendees: davmclau, wickman, jfarrell, mchucarroll, wfarner, jcohen, Yasumoto, kts, jaybuff,
mkhutornenko, zmanji, dlester

- Preface
- scheduler performance issues
- 0.6.0 release
  - Action: all committers to link blockers to release ticket AURORA-711
- job update orchestration in the scheduler


IRC log follows:

## Preface ##
[Mon Sep 22 18:02:58 2014] <kts>: let's get started with a quick roll call
[Mon Sep 22 18:03:05 2014] <Yasumoto>: howdy howdy
[Mon Sep 22 18:03:18 2014] <jfarrell>: here
[Mon Sep 22 18:03:19 2014] <dlester>: present
[Mon Sep 22 18:03:21 2014] <davmclau>: here
[Mon Sep 22 18:03:24 2014] <mchucarroll>: here
[Mon Sep 22 18:03:25 2014] <zmanji>: here
[Mon Sep 22 18:03:27 2014] <jcohen>: here
[Mon Sep 22 18:03:33 2014] <wfarner>: here
[Mon Sep 22 18:03:36 2014] <wickman>: ahoy
[Mon Sep 22 18:03:58 2014] <jaybuff>: howdy
[Mon Sep 22 18:04:15 2014] <mkhutornenko>: morning
[Mon Sep 22 18:04:25 2014] <kts>: morning all
## scheduler performance issues ##
[Mon Sep 22 18:05:11 2014] <kts>: last week we started to see some performance issues
around scheduler snapshots in one of our larger production clusters
[Mon Sep 22 18:05:45 2014] <kts>: so you may have seen a higher number of performance-focused
reviews going by recently
[Mon Sep 22 18:06:58 2014] <wfarner>: i've started investigating this morning, there
may actually be more going on than just snapshots
[Mon Sep 22 18:07:22 2014] <wfarner>: the usual fallout we see is snapshot correlated
with timed out tasks (ASSIGNED/KILLING -> LOST)
[Mon Sep 22 18:07:53 2014] <wfarner>: looking into the timeline for one of these, though,
there seems to be a stall _before_ the snapshot process begins
[Mon Sep 22 18:08:24 2014] <wfarner>: hopefully more to come on this today
[Mon Sep 22 18:08:47 2014] <wfarner>: just to set some expectations appropriately -
this should not impact anything but very large, very heavily-used clusters
[Mon Sep 22 18:09:15 2014] <wfarner>: <eom>
[Mon Sep 22 18:10:15 2014] <kts>: thanks for the update wfarner
## 0.6.0 release ##
[Mon Sep 22 18:11:30 2014] <wfarner>: is there a ticket to track the release yet?  there
are some feature tickets that i could add as blockers
[Mon Sep 22 18:11:44 2014] <jfarrell>: yes, i created one last week
[Mon Sep 22 18:11:46 2014] <dlester>: https://issues.apache.org/jira/browse/AURORA-711
[Mon Sep 22 18:11:48 2014] <kts>: looking at the action items from last week it looks
like everything is pretty much in the same state
[Mon Sep 22 18:11:50 2014] <kts>: http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201409.mbox/%3C20140915185248.8B7B9182C9%40urd.zones.apache.org%3E
[Mon Sep 22 18:12:09 2014] <wfarner>: dlester: thanks
[Mon Sep 22 18:12:20 2014] <wfarner>: kts: more or less, though there has been progress
on feature work
[Mon Sep 22 18:12:24 2014] <jfarrell>: https://issues.apache.org/jira/browse/AURORA-711
[Mon Sep 22 18:13:16 2014] <kts>: #action all committers to link blockers to release
ticket AURORA-711
[Mon Sep 22 18:13:59 2014] <wfarner>: this is also a good time to get deprecation warnings
in for things we would like to remove in 0.7.0
[Mon Sep 22 18:14:14 2014] <wfarner>: relevant ticket for that is https://issues.apache.org/jira/browse/AURORA-423
[Mon Sep 22 18:15:47 2014] <kts>: linked
[Mon Sep 22 18:16:03 2014] <kts>: that's all I've got, any other topics?
[Mon Sep 22 18:16:10 2014] <wfarner>: kts: that should not be linked against 0.6.0
[Mon Sep 22 18:16:43 2014] <wfarner>: AURORA-423 will be a blocker to 0.7.0 release
[Mon Sep 22 18:16:50 2014] <kts>: wfarner: we need some way to represent that the list
has been finalized though right?
[Mon Sep 22 18:17:34 2014] <wfarner>: maybe 'related'?  we definitely shouldn't resolve
AURORA-423 for the 0.6.0 release
[Mon Sep 22 18:18:02 2014] <kts>: works for me
[Mon Sep 22 18:20:41 2014] <davmclau>: We got a real life end to end test running for
the new scheduler updates.
## job update orchestration in the scheduler ##
[Mon Sep 22 18:21:34 2014] <wfarner>: stage is yours, davmclau
[Mon Sep 22 18:22:58 2014] <davmclau>: The status is that wfarner and mkhutornenko completed
the server part with instance events at the end of last week. I updated the UI and we managed
to run a complete end to end test by Friday.
[Mon Sep 22 18:23:20 2014] <davmclau>: I think we still have one or two small issues
to clean up, but that should be wrapped up this week.
[Mon Sep 22 18:24:16 2014] <davmclau>: (eom)
[Mon Sep 22 18:25:10 2014] <kts>: thanks davmclau
[Mon Sep 22 18:25:27 2014] <kts>: any other topics?
[Mon Sep 22 18:26:18 2014] <jaybuff>: sometime this week or next I am hoping to recruit
some people to help write an "Aurora Operational Guide" doc
[Mon Sep 22 18:26:38 2014] <dlester>: jaybuff: sounds great!
[Mon Sep 22 18:26:43 2014] <jaybuff>: i want a mesos one as well
[Mon Sep 22 18:27:07 2014] <wfarner>: jaybuff: count me in
[Mon Sep 22 18:27:12 2014] <Yasumoto>: jaybuff: cool, I'd be stoked to help contribute
to both
[Mon Sep 22 18:27:21 2014] <jaybuff>: i will try to write an outline, then maybe we
can block off an afternoon and brainstorm
[Mon Sep 22 18:27:36 2014] <jfarrell>: jaybuff: can you start a thread on the dev@ list
please, i'm sure a fair amount of people will want to help with that
[Mon Sep 22 18:27:41 2014] <mchucarroll>: i’m also up for helping with that, or pretty
much any other documentation.
[Mon Sep 22 18:27:45 2014] <jaybuff>: sounds great
[Mon Sep 22 18:28:10 2014] <Yasumoto>: ah, one last point
[Mon Sep 22 18:28:11 2014] <jaybuff>: we had a pretty disasterous outage last week and
it revealed some big holes
[Mon Sep 22 18:28:27 2014] <wickman>: jaybuff: can you discuss in any detail?
[Mon Sep 22 18:29:04 2014] <jaybuff>: sure, after meeting i can go into it.  tl;dr there
is a bug in the docker containerizer that causes things to explode when you have slaves with
300+ exited docker containers
[Mon Sep 22 18:29:10 2014] <wickman>: ah
[Mon Sep 22 18:29:34 2014] <Yasumoto>: There should be a new pants release today/tomorrow:
https://github.com/pantsbuild/pants/issues/597, which will help us get https://issues.apache.org/jira/browse/AURORA-585
cleared up
[Mon Sep 22 18:29:47 2014] <Yasumoto>: I'll send out an email to the dev@ list to make
sure no one has concerns
[Mon Sep 22 18:30:13 2014] <Yasumoto>: (it will enforce py27 for the repo, so that may
not be 100% desired- tho there is a config option to change that)
[Mon Sep 22 18:31:45 2014] <kts>: sound great
[Mon Sep 22 18:31:52 2014] <kts>: *sounds
[Mon Sep 22 18:32:40 2014] <kts>: anything else?
[Mon Sep 22 18:33:02 2014] <wfarner>: not from me
[Mon Sep 22 18:33:31 2014] <jfarrell>: think we covered the major items
[Mon Sep 22 18:34:23 2014] <kts>: ASFBot: meeting stop


Meeting ended at Mon Sep 22 18:34:23 2014

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message