Return-Path: X-Original-To: apmail-builds-archive@minotaur.apache.org Delivered-To: apmail-builds-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E90519585 for ; Fri, 1 Jun 2012 03:47:05 +0000 (UTC) Received: (qmail 55061 invoked by uid 500); 1 Jun 2012 03:47:04 -0000 Delivered-To: apmail-builds-archive@apache.org Received: (qmail 54856 invoked by uid 500); 1 Jun 2012 03:47:04 -0000 Mailing-List: contact builds-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: builds@apache.org Delivered-To: mailing list builds@apache.org Received: (qmail 54825 invoked by uid 99); 1 Jun 2012 03:47:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2012 03:47:03 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [61.9.189.140] (HELO nschwmtas02p.mx.bigpond.com) (61.9.189.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2012 03:46:54 +0000 Received: from nschwcmgw06p ([61.9.190.166]) by nschwmtas02p.mx.bigpond.com with ESMTP id <20120601034632.FPXD22122.nschwmtas02p.mx.bigpond.com@nschwcmgw06p> for ; Fri, 1 Jun 2012 03:46:32 +0000 Received: from destiny ([120.146.242.193]) by nschwcmgw06p with BigPond Outbound id GrmY1j0054B4M4w01rmYFn; Fri, 01 Jun 2012 03:46:32 +0000 X-Authority-Analysis: v=2.0 cv=GeeVbHrL c=1 sm=1 a=LhA1vL9ikjG9ftTftRyCyw==:17 a=196dDIb68CwA:10 a=IkcTkHD0fZMA:10 a=dPzNn9bWAAAA:8 a=yPCof4ZbAAAA:8 a=mV9VRH-2AAAA:8 a=HjTjjpOwi-DvgrsxrF4A:9 a=QEXdDO2ut3YA:10 a=7DSvI1NPTFQA:10 a=88iI8knYSJUA:10 a=ymli6Zwkuq0jBAFd:21 a=LEJk05wAHbyTqYej:21 a=LhA1vL9ikjG9ftTftRyCyw==:117 Reply-To: From: "Gavin McDonald" To: References: <4FC64784.5030306@oracle.com> In-Reply-To: <4FC64784.5030306@oracle.com> Subject: RE: [Jenkins] poor handling of offline slaves Date: Fri, 1 Jun 2012 13:16:31 +0930 Organization: 16 degrees complete web solutions Message-ID: <15ef101cd3fa9$22613030$67239090$@16degrees.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Content-Language: en-au Thread-Index: AQHRodg73A3TB/3VODB5P1IA5scQvZbcDR5Q X-Virus-Checked: Checked by ClamAV on apache.org > -----Original Message----- > From: Kristian Waagan [mailto:kristian.waagan@oracle.com] > Sent: Thursday, 31 May 2012 1:45 AM > To: builds@apache.org > Subject: [Jenkins] poor handling of offline slaves >=20 > Hi, >=20 > Currently there are several jobs that have been hanging on a Linux = executor > for several days because windows1 is offline. I've fixed the disk space issue by: 1. Clearing out some junk from Maven and/or poorly configured jobs that = don=E2=80=99t Clean up their workspaces. 2. I added a 80GB disk to replace the 40GB one. > In addition, there are a bunch > of jobs that have been in the queue for days. They will catch up. > It appears that Jenkins lets the "multi OS" jobs wait for a very long = time > before giving up on waiting for a slave. A few questions: > a) Is it possible to have Jenkins fail a job already occupying an = executor slot if > it has to wait for too long? If it is occupying an executor that means the build is running and/or = stuck. If stuck they can be configured to die after a while. With Windows = builds this=20 Does not always work. > b) There's only one windows slave. Are there any plans to add = another > Windows slave (preferably on a different box than windows1)? Not currently. When running well, there is never much of a queue demand = for it. Let it catch up and we'll review the situation again in a week. >=20 > If many projects are configured to run on multiple operating systems, = of > which two have only one slave (Windows and Solaris), these projects = may > cause jobs to pile up on Linux. Maybe there are other mechanisms in = place to > deal with this, I don't know. Not sure what you mean, jobs run independent of each other on multiple = slaves. >=20 > There are currently two other jobs [1] that have been hanging for two = days > or more, but there seems to be enough Linux executors to serve other = jobs > reasonably fast. For that reason I have left them alone for the time = being. I'll delete those. Gav... >=20 >=20 > Thanks, > -- > Kristian >=20 > [1] https://builds.apache.org/job/Ant-Build-Matrix/ and > https://builds.apache.org/job/Empire-db%20multios/