mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933
Date Thu, 17 Nov 2016 08:53:42 GMT
Hi,

>> What do folks think about removing future timeouts in tests altogether?
>> Instead, we can time the whole suite differently on different CIs?

> Has there been any response from the ASF Infra folks on addressing the
> VM/hardware issues? Seems like it will be difficult to get good signal
> from the ASF CI in the absence of some improvements on the
> infrastructure side.

Alex brings up a valid way to largely decouple us from VM lag problems which seems to be mostly
a problem since we expect actions in tests to finished faster than actual happing. The real,
tested code would be much less aggressive in interpreting small response lags as fatal errors.

Would we set the default timeout for say `AWAIT_READY` in our test code to e.g., infinity,
slow VMs would be much less an issue. To not indefinitely block machines for broken tests
we probably should then either limit the duration of our Jenkins jobs (if ASF doesn’t already
have that safeguard), or maybe even add that to our test execution setup itself (e.g., simply
with `timeout(1)` or equivalents from the outside, or inside directly in the harness).

The downside of this is of course that a hanging test (e.g., due to some true race) could
block execution of all other tests.

Being more patient can be helpful in other environments as well (e.g., `valgrind`).


Cheers,

Benjamin
Mime
View raw message