www-builds mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Lundberg <denn...@apache.org>
Subject Re: Windows slaves (1 and 2) offline
Date Tue, 15 Apr 2014 21:23:41 GMT
Hi Robbie,

Yes, sorry about that. Those matrix jobs are tricky. I was so into the
idea that the windows slaves were the bottleneck, that I didn't think
about the possibility that it might be the other way around. My bad.

How come you can't select a node or label to use for a matrix job?


On Tue, Apr 15, 2014 at 2:21 AM, Robbie Gemmell
<robbie.gemmell@gmail.com> wrote:
> '1' was possibly not stuck. It is a matrix project, although while the
> matrix itself can launch on any node including the Windows ones (something
> we apparently cant control) it doesnt use a numbered executor on the slave
> while doing so which is how you killed 3 things when the node only has 2
> executors. The individual jobs within those matrix projects are restricted
> to only run on the Ubuntu nodes, with each sub part getting scheduled
> individually at the end of the job queue after the previous sub part
> completes. Most of the time for the matrix running is simply spent waiting
> for its parts to get to the front of the queue again.
>
> The project was defined that way to ensure we didnt effectively use a
> larger single block of time (2 to 2.5hrs depending on the particular Ubuntu
> nodes used and what else is running) the way many jobs do seem to, though
> it means it can take a very long time for the matrix as a whole to complete
> if the job queue is long due to the number of times it has to wait for each
> part to get to the front of the queue. This seemed fairer than either
> running the parts in a group of separate jobs or a single job and
> effectively only queing once, but it does mean people see the matrix
> sitting there doing not very much for quite some time.
>
> Though they weren't using any executors on the Windows nodes, I have
> regardless disabled the periodic build on the job which triggers '1'.
>
> Robbie
>
> On 14 April 2014 20:37, Dennis Lundberg <dennisl@apache.org> wrote:
>
>> I have just killed the following jobs on windows1, they had been stuck
>> for 23+ hours:
>> 1. https://builds.apache.org/job/Qpid-Java-Java-BDB-TestMatrix/
>> 2. https://builds.apache.org/job/river-qa-refactor-win6/
>> 3. https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/
>>
>> Together they were effectively blocking all other projects that needed
>> a windows slave.
>>
>> The problem with 1 is that it is triggered by
>> https://builds.apache.org/job/Qpid-Java-Java-MMS-TestMatrix
>> which in turn is on a periodical schedule (once a day, 0 9 * * *) as
>> well as an SCM poll schedule (once every 15 minutes, */15 * * * *)
>>
>> The same problem goes for 3 which is on a periodical schedule (once a
>> day, 30 8 * * *)
>>
>> In my opinion we should not allow periodical schedules.
>>
>> On Sun, Apr 13, 2014 at 10:46 AM, Gavin McDonald <gavin@16degrees.com.au>
>> wrote:
>> > Managed to kill 3 of them, looking into why.
>> >
>> > Gav…
>> >
>> > On 13/04/2014, at 7:01 AM, Erik de Bruin <erik@ixsoftware.nl> wrote:
>> >
>> >> Currently there are 4 builds stuck on the windows1 slave. They seem to
>> have
>> >> stopped on the SCM step right at the beginning of their builds.
>> >>
>> >> Can you please take a look?
>> >>
>> >> EdB
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Apr 11, 2014 at 4:43 PM, Alex Harui <aharui@adobe.com> wrote:
>> >>
>> >>> Hi Jake,
>> >>>
>> >>> Thanks for restarting.  I can't help but wonder if there is still some
>> >>> configuration issue with Jenkins and Git that is causing Windows1 to
>> run
>> >>> out of memory.  Is there an investigation going on in that regard?
>> >>>
>> >>> Thanks,
>> >>> -Alex
>> >>>
>> >>> On 4/11/14 7:38 AM, "Jake Farrell" <jfarrell@apache.org> wrote:
>> >>>
>> >>>> Hey Erik
>> >>>> Windows1 ran out of memory, restarted and builds in the queue have
>> been
>> >>>> picked up and are running
>> >>>>
>> >>>> -Jake
>> >>>>
>> >>>>
>> >>>> On Fri, Apr 11, 2014 at 10:17 AM, Erik de Bruin <erik@ixsoftware.nl>
>> >>>> wrote:
>> >>>>
>> >>>>> Same week, second time... The 'windows1' slave is offline. There
are
>> >>>>> builds that have been in the queue for over 12 hours, so it's
not
>> >>>>> 'idling'.
>> >>>>>
>> >>>>> Can someone look at this, please?
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> EdB
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Apr 8, 2014 at 1:08 AM, David Nalley <david@gnsa.us>
wrote:
>> >>>>>
>> >>>>>> Jan and I discussed this briefly at ApacheCon and are tossing
around
>> >>>>>> the idea of having Circonus monitor the status of the slave
>> (according
>> >>>>>> to Jenkins) and perhaps to take corrective action automagically.
>> We're
>> >>>>>> going to continue to think and work on this. Neither of
us have
>> admin
>> >>>>>> privs on the Window's slaves, so we'd want folks that do
(and are
>> thus
>> >>>>>> responsible for maintaining them) to bless this approach.
>> >>>>>>
>> >>>>>> --David
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Apr 7, 2014 at 11:17 AM, Alex Harui <aharui@adobe.com>
>> wrote:
>> >>>>>>> Hi Jake,
>> >>>>>>>
>> >>>>>>> Is there some way you could create a "button" that we
could hit to
>> >>>>>> restart
>> >>>>>>> the Windows slave so we don't have to keep bothering
you?  Or does
>> it
>> >>>>>>> require human intervention to get it to come back up?
>> >>>>>>>
>> >>>>>>> Maybe some script we can get at from people.a.o, or
a custom
>> Jenkins
>> >>>>>> task
>> >>>>>>> that we kick, or a button on the wiki that runs some
script code?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> -Alex
>> >>>>>>>
>> >>>>>>> On 4/7/14 8:13 AM, "Erik de Bruin" <erik@ixsoftware.nl>
wrote:
>> >>>>>>>
>> >>>>>>>> Good news.
>> >>>>>>>>
>> >>>>>>>> Excellent service, thank you!
>> >>>>>>>>
>> >>>>>>>> EdB
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Mon, Apr 7, 2014 at 4:22 PM, Jake Farrell <jfarrell@apache.org
>> >
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hey Erik
>> >>>>>>>>> I just restarted windows 1 and it has picked
up the Apache Flex
>> >>>>>> build
>> >>>>>>>>> and
>> >>>>>>>>> is running it right now.
>> >>>>>>>>>
>> >>>>>>>>> -Jake
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Apr 7, 2014 at 10:08 AM, Erik de Bruin
<
>> erik@ixsoftware.nl
>> >>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> This is becoming a weekly event... both
'windows' slaves are
>> >>>>>> offline,
>> >>>>>>>>>> again.
>> >>>>>>>>>>
>> >>>>>>>>>> You might want to seriously consider accepting
the offers to
>> help
>> >>>>>> from
>> >>>>>>>>>> the friendly people in the "volunteering
for ASF Jenkins farm
>> >>>>>> service
>> >>>>>>>>>> maintenance" thread.
>> >>>>>>>>>>
>> >>>>>>>>>> EdB
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Apr 3, 2014 at 7:22 PM, Jake Farrell
<
>> jfarrell@apache.org
>> >>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> restarted, builds should start getting
picked up shortly
>> >>>>>>>>>>>
>> >>>>>>>>>>> -Jake
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Apr 3, 2014 at 1:05 PM, Erik
de Bruin
>> >>>>>> <erik@ixsoftware.nl>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Both Windows slaves seem to be offline.
There are several
>> >>>>>> 'windows'
>> >>>>>>>>>>> builds
>> >>>>>>>>>>>> in the queue, so it seems they are
not simply idling. Can you
>> >>>>>> please
>> >>>>>>>>>>> take a
>> >>>>>>>>>>>> look?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> EdB
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, Apr 1, 2014 at 9:20 AM,
Jake Farrell
>> >>>>>> <jfarrell@apache.org
>> >>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hey Justin
>> >>>>>>>>>>>>> The builds look like they are
working, now sure why java is
>> >>>>>> giving
>> >>>>>>>>>>> you
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>> error for the latest java path
since
>> >>>>>>>>>>>>> /f/hudson/tools/java/latest-1.6-64/jre/bin/java.exe
-version
>> >>>>>> gives
>> >>>>>>>>>>> me
>> >>>>>>>>>>> a
>> >>>>>>>>>>>>> print out of 1.6.0_27. if you
wouldnt mind creating a ticket
>> >>>>>> for
>> >>>>>>>>>>> this
>> >>>>>>>>>>> so
>> >>>>>>>>>>>>> someone can investigate it I
would appreciate it, its 3am for
>> >>>>>> me
>> >>>>>>>>>>> and I
>> >>>>>>>>>>>>> need
>> >>>>>>>>>>>>> to call it a night
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> -Jake
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Apr 1, 2014 at 3:09
AM, Justin Mclean <
>> >>>>>>>>>>> justin@classsoftware.com
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Flex-sdk_1 and flex-sdk_release
fixed and started, looking
>> >>>>>>>>>>> through the
>> >>>>>>>>>>>>>>> other flex builds now
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>> https://builds.apache.org/view/E-G/view/Flex/job/flex-sdk_1/60/
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>
>> https://builds.apache.org/view/E-G/view/Flex/job/flex-sdk_release/539/
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> While it looks like they
are compiling I noticed this:
>> >>>>>>>>>>>>>> java.io.IOException: Cannot
run program
>> >>>>>>>>>>>>>> "f:\hudson\tools\java\latest-1.6-64\jre\bin\java.exe
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> So look like the version
of java it expects to use is
>> >>>>>> missing??
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Justin
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Ix Multimedia Software
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Jan Luykenstraat 27
>> >>>>>>>>>>>> 3521 VB Utrecht
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> T. 06-51952295
>> >>>>>>>>>>>> I. www.ixsoftware.nl
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Ix Multimedia Software
>> >>>>>>>>>>
>> >>>>>>>>>> Jan Luykenstraat 27
>> >>>>>>>>>> 3521 VB Utrecht
>> >>>>>>>>>>
>> >>>>>>>>>> T. 06-51952295
>> >>>>>>>>>> I. www.ixsoftware.nl
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Ix Multimedia Software
>> >>>>>>>>
>> >>>>>>>> Jan Luykenstraat 27
>> >>>>>>>> 3521 VB Utrecht
>> >>>>>>>>
>> >>>>>>>> T. 06-51952295
>> >>>>>>>> I. www.ixsoftware.nl
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Ix Multimedia Software
>> >>>>>
>> >>>>> Jan Luykenstraat 27
>> >>>>> 3521 VB Utrecht
>> >>>>>
>> >>>>> T. 06-51952295
>> >>>>> I. www.ixsoftware.nl
>> >>>>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Ix Multimedia Software
>> >>
>> >> Jan Luykenstraat 27
>> >> 3521 VB Utrecht
>> >>
>> >> T. 06-51952295
>> >> I. www.ixsoftware.nl
>> >
>>
>>
>>
>> --
>> Dennis Lundberg
>>



-- 
Dennis Lundberg

Mime
View raw message