flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Master test stability poor
Date Mon, 23 May 2016 10:55:15 GMT
if we disable caching, let it run for 1 build and enable it again, will 
that effectively clear the .m2 cache?

On 23.05.2016 12:00, Robert Metzger wrote:
> We could also try to disable the caching of the .m2 directory (I suspect
> that it contains broken jar files). The problem is that it this will make
> the  builds slower on travis because we need to download more.
>
> On Mon, May 23, 2016 at 10:18 AM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>
>> If this doesn't work we may want to think about disabling the problematic
>> profile temporarily.
>>
>>
>> On 23.05.2016 09:53, Ufuk Celebi wrote:
>>
>>> Caches have been cleared again (see
>>> https://issues.apache.org/jira/browse/INFRA-11773)  The first time did
>>> not help. This second request was more an act of desparation. :-(
>>> Let's see what happens now.
>>>
>>> On Wed, Apr 27, 2016 at 3:24 PM, Maximilian Michels <mxm@apache.org>
>>> wrote:
>>>
>>>> +1 for making an effort to tackle test stability problems and
>>>> potential involved bugs.
>>>>
>>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <uce@apache.org> wrote:
>>>>
>>>>> @Max: I think you wanted to look into whether we can use Apache's
>>>>> Jenkins server for our builds instead of Travis. Did you ever get
>>>>> around at looking into it? If yes: What's your opinion on replacing
>>>>> Travis with Jenkins? Is it a viable option? Would it improve the
>>>>> Travis-specific problems?
>>>>>
>>>> I've experimented with the ASF Jenkins installation while setting up
>>>> our nightly snapshot builds. I've observed that the build servers are
>>>> pretty busy. I don't know how busy they are compared to the Travis
>>>> servers and whether we could have more stable builds using Jenkins. I
>>>> guess we would have to try over a period of time.
>>>>
>>>> I was hesitant to enable Jenkins for pull requests because I didn't
>>>> want to spam the ASF servers with builds. Also, there are some
>>>> remaining steps for a good integration like making the Yarn logs
>>>> available (not hard to do though).
>>>>
>>>> What do you think about enabling Jenkins builds for the master and see
>>>> how that goes?
>>>>
>>>> On Wed, Apr 27, 2016 at 2:54 PM, Ufuk Celebi <uce@apache.org> wrote:
>>>>
>>>>> Filed an issue with INFRA:
>>>>> https://issues.apache.org/jira/browse/INFRA-11773
>>>>>
>>>>> @Robert: I agree, but still we see failing builds over and over again.
>>>>> At best it is annoying, at worst it "hides" new bugs being introduced.
>>>>>
>>>>> On Wed, Apr 27, 2016 at 2:41 PM, Till Rohrmann <trohrmann@apache.org>
>>>>> wrote:
>>>>>
>>>>>> That is good to hear that we can so easily solve most of the failing
>>>>>> builds. We should then iterate over the open test-stability issues
to
>>>>>> see
>>>>>> whether they are still valid after we've merged PR 1915.
>>>>>>
>>>>>> On Wed, Apr 27, 2016 at 2:25 PM, Robert Metzger <rmetzger@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I'm not sure if the issues is as big as it seems on a first sight.
>>>>>>> The reason why all the builds of master are red on travis is
that the
>>>>>>> cache
>>>>>>> of the 5th build is invalid. We have to ask infra to delete the
>>>>>>> caches and
>>>>>>> then they'll be green again.
>>>>>>>
>>>>>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>>>
>>>>>>> Along the lines of what Greg already mentioned, I would like
to
>>>>>>>> re-iterate that Travis is often a problem too:
>>>>>>>> - long build times and we are reaching the time limit
>>>>>>>> - unreliable I/O
>>>>>>>> - unreliable resolving of build dependencies
>>>>>>>>
>>>>>>>> @Max: I think you wanted to look into whether we can use
Apache's
>>>>>>>> Jenkins server for our builds instead of Travis. Did you
ever get
>>>>>>>> around at looking into it? If yes: What's your opinion on
replacing
>>>>>>>> Travis with Jenkins? Is it a viable option? Would it improve
the
>>>>>>>> Travis-specific problems?
>>>>>>>>
>>>>>>>> On the other hand, the very slow Travis machines also helped
>>>>>>>> discovering some hard-to-catch race conditions.
>>>>>>>>
>>>>>>>> – Ufuk
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Apr 27, 2016 at 2:01 PM, Greg Hogan <code@greghogan.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We have also started running over Travis' 2 hour limit
for the
>>>>>>>>> longest
>>>>>>>>>
>>>>>>>> build.
>>>>>>>>
>>>>>>>>> Greg
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 27, 2016, at 7:53 AM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>>>>>> Hi Till,
>>>>>>>>>>
>>>>>>>>>> thank you for bringing this up. We really need to
fix this.
>>>>>>>>>>
>>>>>>>>>> Filing JIRAs with critical priority was how we tried
to solve it in
>>>>>>>>>> the past, but obviously it did not work. There seems
to be a
>>>>>>>>>> mismatch
>>>>>>>>>> between assigned and actual priorities.
>>>>>>>>>>
>>>>>>>>>> As a first step, I would volunteer to gather a list
of tests, which
>>>>>>>>>> have failed in the last weeks and make sure that
we have JIRAs for
>>>>>>>>>> them.
>>>>>>>>>>
>>>>>>>>>> As a next step, we should coordinate how to resolve
those issues
>>>>>>>>>> (maybe prioritized by failure frequency) to get master
stable
>>>>>>>>>> again.
>>>>>>>>>>
>>>>>>>>>> – Ufuk
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 27, 2016 at 12:12 PM, Till Rohrmann <
>>>>>>>>>> trohrmann@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Flink community,
>>>>>>>>>>> I just wanted to raise awareness that in the
last 16 days there
>>>>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>> just a
>>>>>>>>> single Travis build of master which passed all tests.
This indicates
>>>>>>>>>> that
>>>>>>>>> we have some serious problems with our test stability
or even worse
>>>>>>>>>>> a
>>>>>>>>>>> problem with the master itself. Having an unstable
master makes it
>>>>>>>>>>>
>>>>>>>>>> really
>>>>>>>>> hard to assess whether new changes actually broke something
or
>>>>>>>>>> whether
>>>>>>>> the
>>>>>>>>
>>>>>>>>> failing test was unrelated.
>>>>>>>>>>> We have currently 37 open issues labeled with
test-stability and
>>>>>>>>>>> most
>>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>> them have a critical priority. Therefore, I would propose
that we
>>>>>>>>>>> try
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>> tackle them as soon as possible in order to improve our
testing
>>>>>>>>>> stability.
>>>>>>>>> Cheers,
>>>>>>>>>>> Till
>>>>>>>>>>>


Mime
View raw message