flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@apache.org>
Subject Re: [ANNOUNCE] Build Issues Solved
Date Tue, 31 May 2016 01:50:47 GMT
Thanks for the great work! :-)

Regards,
Chiwan Park

> On May 31, 2016, at 7:47 AM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
> 
> Awesome work guys!
> And even more thanks for the detailed report...This troubleshooting summary
> will be undoubtedly useful for all our maven projects!
> 
> Best,
> Flavio
> On 30 May 2016 23:47, "Ufuk Celebi" <uce@apache.org> wrote:
> 
>> Thanks for the effort, Max and Stephan! Happy to see the green light again.
>> 
>> On Mon, May 30, 2016 at 11:03 PM, Stephan Ewen <sewen@apache.org> wrote:
>>> Hi all!
>>> 
>>> After a few weeks of terrible build issues, I am happy to announce that
>> the
>>> build works again properly, and we actually get meaningful CI results.
>>> 
>>> Here is a story in many acts, from builds deep red to bright green joy.
>>> Kudos to Max, who did most of this troubleshooting. This evening, Max and
>>> me debugged the final issue and got the build back on track.
>>> 
>>> ------------------
>>> The Journey
>>> ------------------
>>> 
>>> (1) Failsafe Plugin
>>> 
>>> The Maven Failsafe Build Plugin had a critical bug due to which failed
>>> tests did not result in a failed build.
>>> 
>>> That is a pretty bad bug for a plugin whose only task is to run tests and
>>> fail the build if a test fails.
>>> 
>>> After we recognized that, we upgraded the Failsafe Plugin.
>>> 
>>> 
>>> (2) Failsafe Plugin Dependency Issues
>>> 
>>> After the upgrade, the Failsafe Plugin behaved differently and did not
>>> interoperate with Dependency Shading any more.
>>> 
>>> Because of that, we switched to the Surefire Plugin.
>>> 
>>> 
>>> (3) Fixing all the issues introduced in the meantime
>>> 
>>> Naturally, a number of test instabilities had been introduced, which
>> needed
>>> to be fixed.
>>> 
>>> 
>>> (4) Yarn Tests and Test Scope Refactoring
>>> 
>>> In the meantime, a Pull Request was merged that moved the Yarn Tests to
>> the
>>> test scope.
>>> Because the configuration searched for tests in the "main" scope, no Yarn
>>> tests were executed for a while, until the scope was fixed.
>>> 
>>> 
>>> (5) Yarn Tests and JMX Metrics
>>> 
>>> After the Yarn Tests were re-activated, we saw them fail due to warnings
>>> created by the newly introduced metrics code. We could fix that by
>> updating
>>> the metrics code and temporarily not registering JMX beans for all
>> metrics.
>>> 
>>> 
>>> (6) Yarn / Surefire Deadlock
>>> 
>>> Finally, some Yarn tests failed reliably in Maven (though not in the
>> IDE).
>>> It turned out that those test a command line interface that interacts
>> with
>>> the standard input stream.
>>> 
>>> The newly deployed Surefire Plugin uses standard input as well, for
>>> communication with forked JVMs. Since Surefire internally locks the
>>> standard input stream, the Yarn CLI cannot poll the standard input stream
>>> without locking up and stalling the tests.
>>> 
>>> We adjusted the tests and now the build happily builds again.
>>> 
>>> -----------------
>>> Conclusions:
>>> -----------------
>>> 
>>>  - CI is terribly crucial It took us weeks with the fallout of having a
>>> period of unreliably CI.
>>> 
>>>  - Maven could do a better job. A bug as crucial as the one that started
>>> our problem should not occur in a test plugin like surefire. Also, the
>>> constant change of semantics and dependency scopes is annoying. The
>>> semantic changes are subtle, but for a build as complex as Flink, they
>> make
>>> a difference.
>>> 
>>>  - File-based communication is rarely a good idea. The bug in the
>> failsafe
>>> plugin was caused by improper file-based communication, and some of our
>>> discovered instabilities as well.
>>> 
>>> Greetings,
>>> Stephan
>>> 
>>> 
>>> PS: Some issues and mysteries remain for us to solve: When we allow our
>>> metrics subsystem to register JMX beans, we see some tests failing due to
>>> spontaneous JVM process kills. Whoever has a pointer there, please ping
>> us!
>> 


Mime
View raw message