www-builds mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robbie Gemmell <robbie.gemm...@gmail.com>
Subject Re: Windows slaves (1 and 2) offline
Date Tue, 15 Apr 2014 00:21:29 GMT
'1' was possibly not stuck. It is a matrix project, although while the
matrix itself can launch on any node including the Windows ones (something
we apparently cant control) it doesnt use a numbered executor on the slave
while doing so which is how you killed 3 things when the node only has 2
executors. The individual jobs within those matrix projects are restricted
to only run on the Ubuntu nodes, with each sub part getting scheduled
individually at the end of the job queue after the previous sub part
completes. Most of the time for the matrix running is simply spent waiting
for its parts to get to the front of the queue again.

The project was defined that way to ensure we didnt effectively use a
larger single block of time (2 to 2.5hrs depending on the particular Ubuntu
nodes used and what else is running) the way many jobs do seem to, though
it means it can take a very long time for the matrix as a whole to complete
if the job queue is long due to the number of times it has to wait for each
part to get to the front of the queue. This seemed fairer than either
running the parts in a group of separate jobs or a single job and
effectively only queing once, but it does mean people see the matrix
sitting there doing not very much for quite some time.

Though they weren't using any executors on the Windows nodes, I have
regardless disabled the periodic build on the job which triggers '1'.

Robbie

On 14 April 2014 20:37, Dennis Lundberg <dennisl@apache.org> wrote:

> I have just killed the following jobs on windows1, they had been stuck
> for 23+ hours:
> 1. https://builds.apache.org/job/Qpid-Java-Java-BDB-TestMatrix/
> 2. https://builds.apache.org/job/river-qa-refactor-win6/
> 3. https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/
>
> Together they were effectively blocking all other projects that needed
> a windows slave.
>
> The problem with 1 is that it is triggered by
> https://builds.apache.org/job/Qpid-Java-Java-MMS-TestMatrix
> which in turn is on a periodical schedule (once a day, 0 9 * * *) as
> well as an SCM poll schedule (once every 15 minutes, */15 * * * *)
>
> The same problem goes for 3 which is on a periodical schedule (once a
> day, 30 8 * * *)
>
> In my opinion we should not allow periodical schedules.
>
> On Sun, Apr 13, 2014 at 10:46 AM, Gavin McDonald <gavin@16degrees.com.au>
> wrote:
> > Managed to kill 3 of them, looking into why.
> >
> > Gav…
> >
> > On 13/04/2014, at 7:01 AM, Erik de Bruin <erik@ixsoftware.nl> wrote:
> >
> >> Currently there are 4 builds stuck on the windows1 slave. They seem to
> have
> >> stopped on the SCM step right at the beginning of their builds.
> >>
> >> Can you please take a look?
> >>
> >> EdB
> >>
> >>
> >>
> >>
> >> On Fri, Apr 11, 2014 at 4:43 PM, Alex Harui <aharui@adobe.com> wrote:
> >>
> >>> Hi Jake,
> >>>
> >>> Thanks for restarting.  I can't help but wonder if there is still some
> >>> configuration issue with Jenkins and Git that is causing Windows1 to
> run
> >>> out of memory.  Is there an investigation going on in that regard?
> >>>
> >>> Thanks,
> >>> -Alex
> >>>
> >>> On 4/11/14 7:38 AM, "Jake Farrell" <jfarrell@apache.org> wrote:
> >>>
> >>>> Hey Erik
> >>>> Windows1 ran out of memory, restarted and builds in the queue have
> been
> >>>> picked up and are running
> >>>>
> >>>> -Jake
> >>>>
> >>>>
> >>>> On Fri, Apr 11, 2014 at 10:17 AM, Erik de Bruin <erik@ixsoftware.nl>
> >>>> wrote:
> >>>>
> >>>>> Same week, second time... The 'windows1' slave is offline. There
are
> >>>>> builds that have been in the queue for over 12 hours, so it's not
> >>>>> 'idling'.
> >>>>>
> >>>>> Can someone look at this, please?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> EdB
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 8, 2014 at 1:08 AM, David Nalley <david@gnsa.us>
wrote:
> >>>>>
> >>>>>> Jan and I discussed this briefly at ApacheCon and are tossing
around
> >>>>>> the idea of having Circonus monitor the status of the slave
> (according
> >>>>>> to Jenkins) and perhaps to take corrective action automagically.
> We're
> >>>>>> going to continue to think and work on this. Neither of us have
> admin
> >>>>>> privs on the Window's slaves, so we'd want folks that do (and
are
> thus
> >>>>>> responsible for maintaining them) to bless this approach.
> >>>>>>
> >>>>>> --David
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Apr 7, 2014 at 11:17 AM, Alex Harui <aharui@adobe.com>
> wrote:
> >>>>>>> Hi Jake,
> >>>>>>>
> >>>>>>> Is there some way you could create a "button" that we could
hit to
> >>>>>> restart
> >>>>>>> the Windows slave so we don't have to keep bothering you?
 Or does
> it
> >>>>>>> require human intervention to get it to come back up?
> >>>>>>>
> >>>>>>> Maybe some script we can get at from people.a.o, or a custom
> Jenkins
> >>>>>> task
> >>>>>>> that we kick, or a button on the wiki that runs some script
code?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> -Alex
> >>>>>>>
> >>>>>>> On 4/7/14 8:13 AM, "Erik de Bruin" <erik@ixsoftware.nl>
wrote:
> >>>>>>>
> >>>>>>>> Good news.
> >>>>>>>>
> >>>>>>>> Excellent service, thank you!
> >>>>>>>>
> >>>>>>>> EdB
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Apr 7, 2014 at 4:22 PM, Jake Farrell <jfarrell@apache.org
> >
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hey Erik
> >>>>>>>>> I just restarted windows 1 and it has picked up
the Apache Flex
> >>>>>> build
> >>>>>>>>> and
> >>>>>>>>> is running it right now.
> >>>>>>>>>
> >>>>>>>>> -Jake
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 7, 2014 at 10:08 AM, Erik de Bruin <
> erik@ixsoftware.nl
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> This is becoming a weekly event... both 'windows'
slaves are
> >>>>>> offline,
> >>>>>>>>>> again.
> >>>>>>>>>>
> >>>>>>>>>> You might want to seriously consider accepting
the offers to
> help
> >>>>>> from
> >>>>>>>>>> the friendly people in the "volunteering for
ASF Jenkins farm
> >>>>>> service
> >>>>>>>>>> maintenance" thread.
> >>>>>>>>>>
> >>>>>>>>>> EdB
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Apr 3, 2014 at 7:22 PM, Jake Farrell
<
> jfarrell@apache.org
> >>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> restarted, builds should start getting picked
up shortly
> >>>>>>>>>>>
> >>>>>>>>>>> -Jake
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Apr 3, 2014 at 1:05 PM, Erik de
Bruin
> >>>>>> <erik@ixsoftware.nl>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Both Windows slaves seem to be offline.
There are several
> >>>>>> 'windows'
> >>>>>>>>>>> builds
> >>>>>>>>>>>> in the queue, so it seems they are not
simply idling. Can you
> >>>>>> please
> >>>>>>>>>>> take a
> >>>>>>>>>>>> look?
> >>>>>>>>>>>>
> >>>>>>>>>>>> EdB
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Apr 1, 2014 at 9:20 AM, Jake
Farrell
> >>>>>> <jfarrell@apache.org
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hey Justin
> >>>>>>>>>>>>> The builds look like they are working,
now sure why java is
> >>>>>> giving
> >>>>>>>>>>> you
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> error for the latest java path since
> >>>>>>>>>>>>> /f/hudson/tools/java/latest-1.6-64/jre/bin/java.exe
-version
> >>>>>> gives
> >>>>>>>>>>> me
> >>>>>>>>>>> a
> >>>>>>>>>>>>> print out of 1.6.0_27. if you wouldnt
mind creating a ticket
> >>>>>> for
> >>>>>>>>>>> this
> >>>>>>>>>>> so
> >>>>>>>>>>>>> someone can investigate it I would
appreciate it, its 3am for
> >>>>>> me
> >>>>>>>>>>> and I
> >>>>>>>>>>>>> need
> >>>>>>>>>>>>> to call it a night
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jake
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Apr 1, 2014 at 3:09 AM,
Justin Mclean <
> >>>>>>>>>>> justin@classsoftware.com
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Flex-sdk_1 and flex-sdk_release
fixed and started, looking
> >>>>>>>>>>> through the
> >>>>>>>>>>>>>>> other flex builds now
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>> https://builds.apache.org/view/E-G/view/Flex/job/flex-sdk_1/60/
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>
> https://builds.apache.org/view/E-G/view/Flex/job/flex-sdk_release/539/
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> While it looks like they are
compiling I noticed this:
> >>>>>>>>>>>>>> java.io.IOException: Cannot
run program
> >>>>>>>>>>>>>> "f:\hudson\tools\java\latest-1.6-64\jre\bin\java.exe
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So look like the version of
java it expects to use is
> >>>>>> missing??
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Justin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Ix Multimedia Software
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jan Luykenstraat 27
> >>>>>>>>>>>> 3521 VB Utrecht
> >>>>>>>>>>>>
> >>>>>>>>>>>> T. 06-51952295
> >>>>>>>>>>>> I. www.ixsoftware.nl
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Ix Multimedia Software
> >>>>>>>>>>
> >>>>>>>>>> Jan Luykenstraat 27
> >>>>>>>>>> 3521 VB Utrecht
> >>>>>>>>>>
> >>>>>>>>>> T. 06-51952295
> >>>>>>>>>> I. www.ixsoftware.nl
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Ix Multimedia Software
> >>>>>>>>
> >>>>>>>> Jan Luykenstraat 27
> >>>>>>>> 3521 VB Utrecht
> >>>>>>>>
> >>>>>>>> T. 06-51952295
> >>>>>>>> I. www.ixsoftware.nl
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Ix Multimedia Software
> >>>>>
> >>>>> Jan Luykenstraat 27
> >>>>> 3521 VB Utrecht
> >>>>>
> >>>>> T. 06-51952295
> >>>>> I. www.ixsoftware.nl
> >>>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Ix Multimedia Software
> >>
> >> Jan Luykenstraat 27
> >> 3521 VB Utrecht
> >>
> >> T. 06-51952295
> >> I. www.ixsoftware.nl
> >
>
>
>
> --
> Dennis Lundberg
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message