hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: The Jenkins VMs are increasingly slow / overloaded
Date Fri, 05 Apr 2013 23:48:03 GMT
Also, be careful to differentiate between slaves that are "offline" because
they are in the process of being launched, and those that are offline
because of that bug I mention. (It doesn't happen often but does happen.)
If you kill an "offline" slave being launched, this will just cause churn.
And if this seems like something you don't want to bother with, then just
don't worry about it.



On Fri, Apr 5, 2013 at 4:44 PM, Andrew Purtell <apurtell@apache.org> wrote:

> This is a bug in the EC2 module for Jenkins. There are other bugs which
> this one fixes so it's not a big deal relative to those. You have an
> account on this system. You can easily go on and delete the slaves which
> end up in offline state.
>
>
> On Fri, Apr 5, 2013 at 4:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Looks like 4 ECs Jenkins slaves are offline at the moment ...
>>
>>
>> On Wed, Mar 27, 2013 at 1:19 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > Looks like Apache Jenkins went off several times this week.
>> >
>> > Is it difficult to hook up patching test with the new Jenkins ?
>> >
>> > Thanks
>> >
>> >
>> > On Wed, Mar 27, 2013 at 7:49 AM, Andrew Purtell <apurtell@apache.org
>> >wrote:
>> >
>> >> True, but unlike 0.94 the state of 0.95 and trunk is impacted by
>> Stack's
>> >> wrangling with Maven to find a sane site and assembly, a number of
>> build
>> >> failures are due to that. Also you'll note that prior to yesterday the
>> >> Linux OOM killer was nuking the bloated Maven processes on the build
>> >> slaves. Let's give these builds a bit of time for this stuff to get
>> sorted
>> >> out. The failures in 0.94 seem immediately actionable.
>> >>
>> >>
>> >> On Wed, Mar 27, 2013 at 3:38 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >>
>> >> > Trunk and 0.95 builds are not in good shape.
>> >> > 0.95 builds have been failing for 32 times.
>> >> >
>> >> > On Apache Jenkins, looks like TestAssignmentManagerOnCluster has
>> failed
>> >> > quite often for 0.95 and trunk builds.
>> >> >
>> >> > On Wed, Mar 27, 2013 at 7:18 AM, Andrew Purtell <apurtell@apache.org
>> >
>> >> > wrote:
>> >> >
>> >> > > In general moving from using the m1.large (2 vcores, 7.5 GB RAM)
to
>> >> the
>> >> > > m1.xlarge (4 vcores, 15 GB RAM) instance type for the slaves helped
>> >> with
>> >> > a
>> >> > > build/test timeout, so now I'd about claim the test environment
is
>> >> sane.
>> >> > We
>> >> > > are now seeing that replication tests are flapping, occasionally
>> >> timing
>> >> > out
>> >> > > internally:
>> >> > >
>> >> > > See
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://54.241.6.143/job/HBase-0.94/org.apache.hbase$hbase/24/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationQueueFailoverCompressed/queueFailover/
>> >> > >
>> >> > >
>> >> > > and
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://54.241.6.143/job/HBase-0.94-Security/org.apache.hbase$hbase/7/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationQueueFailover/queueFailover/
>> >> > >
>> >> > >
>> >> > > The 0.94 and 0.94-security builds are alternating between green
and
>> >> red
>> >> > as
>> >> > > a result.
>> >> > >
>> >> > > Perhaps we should reopen/revisit either adjusting the internal
>> >> timeouts
>> >> > for
>> >> > > these tests or the other JIRA about moving minicluster replication
>> >> tests
>> >> > to
>> >> > > hbase-it.
>> >> > >
>> >> > >
>> >> > > On Wed, Mar 27, 2013 at 1:49 AM, Nick Dimiduk <ndimiduk@gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > > On Tue, Mar 26, 2013 at 1:28 PM, Andrew Purtell <
>> >> apurtell@apache.org>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > The HBase 0.94 build is now testing green!
>> >> > > > > http://54.241.6.143/job/HBase-0.94/
>> >> > > > >
>> >> > > >
>> >> > > > ^5!
>> >> > > >
>> >> > > > On Tue, Mar 26, 2013 at 1:47 AM, Andrew Purtell <
>> >> apurtell@apache.org>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > > I found that Maven was being killed on the slaves
by the
>> Linux
>> >> OOM
>> >> > > > killer
>> >> > > > > > sometimes for >= 0.95. Seems the m1.large process
didn't have
>> >> > enough
>> >> > > > > memory
>> >> > > > > > to host the Jenkins slave, Maven with its 3G+ heap,
and the
>> >> forked
>> >> > > JVMs
>> >> > > > > for
>> >> > > > > > the medium and large tests at the same time. Switching
to the
>> >> > > m1.xlarge
>> >> > > > > > type resolved this. Now the 0.95 and trunk builds
fail for
>> what
>> >> > looks
>> >> > > > > like
>> >> > > > > > a legitimate problem with a hanging test.
>> >> > > > > >
>> >> > > > >
>> >> > > > > --
>> >> > > > > Best regards,
>> >> > > > >
>> >> > > > >    - Andy
>> >> > > > >
>> >> > > > > Problems worthy of attack prove their worth by hitting
back. -
>> >> Piet
>> >> > > Hein
>> >> > > > > (via Tom White)
>> >> > > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Best regards,
>> >> > >
>> >> > >    - Andy
>> >> > >
>> >> > > Problems worthy of attack prove their worth by hitting back. -
Piet
>> >> Hein
>> >> > > (via Tom White)
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >>
>> >>    - Andy
>> >>
>> >> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> >> (via Tom White)
>> >>
>> >
>> >
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message