hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: All builds are recently failing with timeout or fork errors, let's change settings
Date Mon, 19 Jan 2015 18:11:30 GMT
The 0.98 build is still showing this problem (latest as of now at
https://builds.apache.org/job/hbase-0.98/803), so I went ahead and made the
proposed change, but only to the 0.98 builds. I'll let you know if it
provides any improvement.


On Sun, Jan 18, 2015 at 10:00 AM, Andrew Purtell <andrew.purtell@gmail.com>
wrote:

> Forked VMs are being killed in the 0.98 builds. That suggests
> infrastructure issues.
>
> Having only one test execute in a forked runner does mean the finding of a
> zombie and thread dumps or other state from the runner will identify and
> characterize a sick test with no unrelated state mixed in.
>
>
> > On Jan 17, 2015, at 7:43 PM, Stack <stack@duboce.net> wrote:
> >
> > Agree, try anything to get our blues back.  We add back the //ism after
> all
> > settles.
> >
> > Do you think something has changed in INFRA Andy? Is it more contended?
> Or,
> > more likely, is it that we've been committing stuff that has destabilized
> > builds? We had a good streak of blue there for a while. It just took some
> > work fixing breakage and watching jenkins to make sure breakage didn't
> > sneak in, but we've lapsed for sure.
> >
> > St.Ack
> >
> >> On Sat, Jan 17, 2015 at 9:19 AM, Dima Spivak <dspivak@cloudera.com>
> wrote:
> >>
> >> Not running tests in parallel will definitely cut down on Surefire
> >> flakiness (and in contention that sometimes leads to false failures in
> >> resource-hungry tests), but it will probably also balloon test run
> times to
> >> about two hours. Probably worth it in the short term, but we
> >> eventually need to do something about some of these heavy tests.
> >>
> >> -Dima
> >>
> >> On Friday, January 16, 2015, Andrew Purtell <andrew.purtell@gmail.com>
> >> wrote:
> >>
> >>> You might have missed the larger issue Ted.
> >>>
> >>>
> >>>> On Jan 16, 2015, at 4:48 PM, Ted Yu <yuzhihong@gmail.com
> >> <javascript:;>>
> >>> wrote:
> >>>>
> >>>> With HBASE-12874, we should get a green build for branch-1.0
> >>>>
> >>>> FYI
> >>>>
> >>>> On Fri, Jan 16, 2015 at 12:20 PM, Andrew Purtell <apurtell@apache.org
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>> See BUILDS-49 tracking issues specifically with 0.98 jobs, but I
just
> >>>>> noticed trunk, branch-1, and branch-1.0 all failed after I checked
in
> >> a
> >>>>> shell doc fix due to a timeout or fork failure.
> >>>>>
> >>>>> I propose we update all Jenkins jobs to not run tests in parallel,
> >> i.e.
> >>> add
> >>>>> "-Dsurefire.firstPartForkCount=1 -Dsurefire.secondPartForkCount=1"
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>  - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>>> (via Tom White)
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message