hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: All builds are recently failing with timeout or fork errors, let's change settings
Date Sun, 18 Jan 2015 18:00:22 GMT
Forked VMs are being killed in the 0.98 builds. That suggests infrastructure issues. 

Having only one test execute in a forked runner does mean the finding of a zombie and thread
dumps or other state from the runner will identify and characterize a sick test with no unrelated
state mixed in. 


> On Jan 17, 2015, at 7:43 PM, Stack <stack@duboce.net> wrote:
> 
> Agree, try anything to get our blues back.  We add back the //ism after all
> settles.
> 
> Do you think something has changed in INFRA Andy? Is it more contended? Or,
> more likely, is it that we've been committing stuff that has destabilized
> builds? We had a good streak of blue there for a while. It just took some
> work fixing breakage and watching jenkins to make sure breakage didn't
> sneak in, but we've lapsed for sure.
> 
> St.Ack
> 
>> On Sat, Jan 17, 2015 at 9:19 AM, Dima Spivak <dspivak@cloudera.com> wrote:
>> 
>> Not running tests in parallel will definitely cut down on Surefire
>> flakiness (and in contention that sometimes leads to false failures in
>> resource-hungry tests), but it will probably also balloon test run times to
>> about two hours. Probably worth it in the short term, but we
>> eventually need to do something about some of these heavy tests.
>> 
>> -Dima
>> 
>> On Friday, January 16, 2015, Andrew Purtell <andrew.purtell@gmail.com>
>> wrote:
>> 
>>> You might have missed the larger issue Ted.
>>> 
>>> 
>>>> On Jan 16, 2015, at 4:48 PM, Ted Yu <yuzhihong@gmail.com
>> <javascript:;>>
>>> wrote:
>>>> 
>>>> With HBASE-12874, we should get a green build for branch-1.0
>>>> 
>>>> FYI
>>>> 
>>>> On Fri, Jan 16, 2015 at 12:20 PM, Andrew Purtell <apurtell@apache.org
>>> <javascript:;>>
>>>> wrote:
>>>> 
>>>>> See BUILDS-49 tracking issues specifically with 0.98 jobs, but I just
>>>>> noticed trunk, branch-1, and branch-1.0 all failed after I checked in
>> a
>>>>> shell doc fix due to a timeout or fork failure.
>>>>> 
>>>>> I propose we update all Jenkins jobs to not run tests in parallel,
>> i.e.
>>> add
>>>>> "-Dsurefire.firstPartForkCount=1 -Dsurefire.secondPartForkCount=1"
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> 
>>>>>  - Andy
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>> (via Tom White)
>> 

Mime
View raw message