Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E199711994 for ; Mon, 22 Sep 2014 18:35:14 +0000 (UTC) Received: (qmail 1902 invoked by uid 500); 22 Sep 2014 18:35:14 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 1853 invoked by uid 500); 22 Sep 2014 18:35:14 -0000 Mailing-List: contact dev-help@aurora.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.incubator.apache.org Delivered-To: mailing list dev@aurora.incubator.apache.org Received: (qmail 1842 invoked by uid 99); 22 Sep 2014 18:35:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2014 18:35:14 +0000 X-ASF-Spam-Status: No, hits=-1998.6 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 22 Sep 2014 18:34:50 +0000 Received: (qmail 98797 invoked by uid 99); 22 Sep 2014 18:34:46 -0000 Received: from urd.zones.apache.org (HELO urd.zones.apache.org) (140.211.11.125) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2014 18:34:46 +0000 Received: from urd.zones.apache.org (urd.zones.apache.org [140.211.11.125]) by urd.zones.apache.org (Postfix) with ESMTP id 4416218E93 for ; Mon, 22 Sep 2014 18:34:46 +0000 (UTC) Content-Type: multipart/mixed; boundary="===============1340477987155317299==" MIME-Version: 1.0 Subject: Summary of IRC Meeting in #aurora From: ASF IRC Bot To: dev@aurora.incubator.apache.org Message-Id: <20140922183446.4416218E93@urd.zones.apache.org> Date: Mon, 22 Sep 2014 18:34:46 +0000 (UTC) X-Virus-Checked: Checked by ClamAV on apache.org --===============1340477987155317299== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Summary of IRC Meeting in #aurora at Mon Sep 22 18:02:41 2014: Attendees: davmclau, wickman, jfarrell, mchucarroll, wfarner, jcohen, Yasumoto, kts, jaybuff, mkhutornenko, zmanji, dlester - Preface - scheduler performance issues - 0.6.0 release - Action: all committers to link blockers to release ticket AURORA-711 - job update orchestration in the scheduler IRC log follows: ## Preface ## [Mon Sep 22 18:02:58 2014] : let's get started with a quick roll call [Mon Sep 22 18:03:05 2014] : howdy howdy [Mon Sep 22 18:03:18 2014] : here [Mon Sep 22 18:03:19 2014] : present [Mon Sep 22 18:03:21 2014] : here [Mon Sep 22 18:03:24 2014] : here [Mon Sep 22 18:03:25 2014] : here [Mon Sep 22 18:03:27 2014] : here [Mon Sep 22 18:03:33 2014] : here [Mon Sep 22 18:03:36 2014] : ahoy [Mon Sep 22 18:03:58 2014] : howdy [Mon Sep 22 18:04:15 2014] : morning [Mon Sep 22 18:04:25 2014] : morning all ## scheduler performance issues ## [Mon Sep 22 18:05:11 2014] : last week we started to see some performance issues around scheduler snapshots in one of our larger production clusters [Mon Sep 22 18:05:45 2014] : so you may have seen a higher number of performance-focused reviews going by recently [Mon Sep 22 18:06:58 2014] : i've started investigating this morning, there may actually be more going on than just snapshots [Mon Sep 22 18:07:22 2014] : the usual fallout we see is snapshot correlated with timed out tasks (ASSIGNED/KILLING -> LOST) [Mon Sep 22 18:07:53 2014] : looking into the timeline for one of these, though, there seems to be a stall _before_ the snapshot process begins [Mon Sep 22 18:08:24 2014] : hopefully more to come on this today [Mon Sep 22 18:08:47 2014] : just to set some expectations appropriately - this should not impact anything but very large, very heavily-used clusters [Mon Sep 22 18:09:15 2014] : [Mon Sep 22 18:10:15 2014] : thanks for the update wfarner ## 0.6.0 release ## [Mon Sep 22 18:11:30 2014] : is there a ticket to track the release yet? there are some feature tickets that i could add as blockers [Mon Sep 22 18:11:44 2014] : yes, i created one last week [Mon Sep 22 18:11:46 2014] : https://issues.apache.org/jira/browse/AURORA-711 [Mon Sep 22 18:11:48 2014] : looking at the action items from last week it looks like everything is pretty much in the same state [Mon Sep 22 18:11:50 2014] : http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201409.mbox/%3C20140915185248.8B7B9182C9%40urd.zones.apache.org%3E [Mon Sep 22 18:12:09 2014] : dlester: thanks [Mon Sep 22 18:12:20 2014] : kts: more or less, though there has been progress on feature work [Mon Sep 22 18:12:24 2014] : https://issues.apache.org/jira/browse/AURORA-711 [Mon Sep 22 18:13:16 2014] : #action all committers to link blockers to release ticket AURORA-711 [Mon Sep 22 18:13:59 2014] : this is also a good time to get deprecation warnings in for things we would like to remove in 0.7.0 [Mon Sep 22 18:14:14 2014] : relevant ticket for that is https://issues.apache.org/jira/browse/AURORA-423 [Mon Sep 22 18:15:47 2014] : linked [Mon Sep 22 18:16:03 2014] : that's all I've got, any other topics? [Mon Sep 22 18:16:10 2014] : kts: that should not be linked against 0.6.0 [Mon Sep 22 18:16:43 2014] : AURORA-423 will be a blocker to 0.7.0 release [Mon Sep 22 18:16:50 2014] : wfarner: we need some way to represent that the list has been finalized though right? [Mon Sep 22 18:17:34 2014] : maybe 'related'? we definitely shouldn't resolve AURORA-423 for the 0.6.0 release [Mon Sep 22 18:18:02 2014] : works for me [Mon Sep 22 18:20:41 2014] : We got a real life end to end test running for the new scheduler updates. ## job update orchestration in the scheduler ## [Mon Sep 22 18:21:34 2014] : stage is yours, davmclau [Mon Sep 22 18:22:58 2014] : The status is that wfarner and mkhutornenko completed the server part with instance events at the end of last week. I updated the UI and we managed to run a complete end to end test by Friday. [Mon Sep 22 18:23:20 2014] : I think we still have one or two small issues to clean up, but that should be wrapped up this week. [Mon Sep 22 18:24:16 2014] : (eom) [Mon Sep 22 18:25:10 2014] : thanks davmclau [Mon Sep 22 18:25:27 2014] : any other topics? [Mon Sep 22 18:26:18 2014] : sometime this week or next I am hoping to recruit some people to help write an "Aurora Operational Guide" doc [Mon Sep 22 18:26:38 2014] : jaybuff: sounds great! [Mon Sep 22 18:26:43 2014] : i want a mesos one as well [Mon Sep 22 18:27:07 2014] : jaybuff: count me in [Mon Sep 22 18:27:12 2014] : jaybuff: cool, I'd be stoked to help contribute to both [Mon Sep 22 18:27:21 2014] : i will try to write an outline, then maybe we can block off an afternoon and brainstorm [Mon Sep 22 18:27:36 2014] : jaybuff: can you start a thread on the dev@ list please, i'm sure a fair amount of people will want to help with that [Mon Sep 22 18:27:41 2014] : i’m also up for helping with that, or pretty much any other documentation. [Mon Sep 22 18:27:45 2014] : sounds great [Mon Sep 22 18:28:10 2014] : ah, one last point [Mon Sep 22 18:28:11 2014] : we had a pretty disasterous outage last week and it revealed some big holes [Mon Sep 22 18:28:27 2014] : jaybuff: can you discuss in any detail? [Mon Sep 22 18:29:04 2014] : sure, after meeting i can go into it. tl;dr there is a bug in the docker containerizer that causes things to explode when you have slaves with 300+ exited docker containers [Mon Sep 22 18:29:10 2014] : ah [Mon Sep 22 18:29:34 2014] : There should be a new pants release today/tomorrow: https://github.com/pantsbuild/pants/issues/597, which will help us get https://issues.apache.org/jira/browse/AURORA-585 cleared up [Mon Sep 22 18:29:47 2014] : I'll send out an email to the dev@ list to make sure no one has concerns [Mon Sep 22 18:30:13 2014] : (it will enforce py27 for the repo, so that may not be 100% desired- tho there is a config option to change that) [Mon Sep 22 18:31:45 2014] : sound great [Mon Sep 22 18:31:52 2014] : *sounds [Mon Sep 22 18:32:40 2014] : anything else? [Mon Sep 22 18:33:02 2014] : not from me [Mon Sep 22 18:33:31 2014] : think we covered the major items [Mon Sep 22 18:34:23 2014] : ASFBot: meeting stop Meeting ended at Mon Sep 22 18:34:23 2014 --===============1340477987155317299==--