flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Master test stability poor
Date Mon, 23 May 2016 10:00:00 GMT
We could also try to disable the caching of the .m2 directory (I suspect
that it contains broken jar files). The problem is that it this will make
the  builds slower on travis because we need to download more.

On Mon, May 23, 2016 at 10:18 AM, Chesnay Schepler <chesnay@apache.org>
wrote:

> If this doesn't work we may want to think about disabling the problematic
> profile temporarily.
>
>
> On 23.05.2016 09:53, Ufuk Celebi wrote:
>
>> Caches have been cleared again (see
>> https://issues.apache.org/jira/browse/INFRA-11773)  The first time did
>> not help. This second request was more an act of desparation. :-(
>> Let's see what happens now.
>>
>> On Wed, Apr 27, 2016 at 3:24 PM, Maximilian Michels <mxm@apache.org>
>> wrote:
>>
>>> +1 for making an effort to tackle test stability problems and
>>> potential involved bugs.
>>>
>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>>> @Max: I think you wanted to look into whether we can use Apache's
>>>> Jenkins server for our builds instead of Travis. Did you ever get
>>>> around at looking into it? If yes: What's your opinion on replacing
>>>> Travis with Jenkins? Is it a viable option? Would it improve the
>>>> Travis-specific problems?
>>>>
>>> I've experimented with the ASF Jenkins installation while setting up
>>> our nightly snapshot builds. I've observed that the build servers are
>>> pretty busy. I don't know how busy they are compared to the Travis
>>> servers and whether we could have more stable builds using Jenkins. I
>>> guess we would have to try over a period of time.
>>>
>>> I was hesitant to enable Jenkins for pull requests because I didn't
>>> want to spam the ASF servers with builds. Also, there are some
>>> remaining steps for a good integration like making the Yarn logs
>>> available (not hard to do though).
>>>
>>> What do you think about enabling Jenkins builds for the master and see
>>> how that goes?
>>>
>>> On Wed, Apr 27, 2016 at 2:54 PM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>>> Filed an issue with INFRA:
>>>> https://issues.apache.org/jira/browse/INFRA-11773
>>>>
>>>> @Robert: I agree, but still we see failing builds over and over again.
>>>> At best it is annoying, at worst it "hides" new bugs being introduced.
>>>>
>>>> On Wed, Apr 27, 2016 at 2:41 PM, Till Rohrmann <trohrmann@apache.org>
>>>> wrote:
>>>>
>>>>> That is good to hear that we can so easily solve most of the failing
>>>>> builds. We should then iterate over the open test-stability issues to
>>>>> see
>>>>> whether they are still valid after we've merged PR 1915.
>>>>>
>>>>> On Wed, Apr 27, 2016 at 2:25 PM, Robert Metzger <rmetzger@apache.org>
>>>>> wrote:
>>>>>
>>>>> I'm not sure if the issues is as big as it seems on a first sight.
>>>>>> The reason why all the builds of master are red on travis is that
the
>>>>>> cache
>>>>>> of the 5th build is invalid. We have to ask infra to delete the
>>>>>> caches and
>>>>>> then they'll be green again.
>>>>>>
>>>>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>>
>>>>>> Along the lines of what Greg already mentioned, I would like to
>>>>>>> re-iterate that Travis is often a problem too:
>>>>>>> - long build times and we are reaching the time limit
>>>>>>> - unreliable I/O
>>>>>>> - unreliable resolving of build dependencies
>>>>>>>
>>>>>>> @Max: I think you wanted to look into whether we can use Apache's
>>>>>>> Jenkins server for our builds instead of Travis. Did you ever
get
>>>>>>> around at looking into it? If yes: What's your opinion on replacing
>>>>>>> Travis with Jenkins? Is it a viable option? Would it improve
the
>>>>>>> Travis-specific problems?
>>>>>>>
>>>>>>> On the other hand, the very slow Travis machines also helped
>>>>>>> discovering some hard-to-catch race conditions.
>>>>>>>
>>>>>>> – Ufuk
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 27, 2016 at 2:01 PM, Greg Hogan <code@greghogan.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We have also started running over Travis' 2 hour limit for
the
>>>>>>>> longest
>>>>>>>>
>>>>>>> build.
>>>>>>>
>>>>>>>> Greg
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 27, 2016, at 7:53 AM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>>>>>
>>>>>>>>> Hi Till,
>>>>>>>>>
>>>>>>>>> thank you for bringing this up. We really need to fix
this.
>>>>>>>>>
>>>>>>>>> Filing JIRAs with critical priority was how we tried
to solve it in
>>>>>>>>> the past, but obviously it did not work. There seems
to be a
>>>>>>>>> mismatch
>>>>>>>>> between assigned and actual priorities.
>>>>>>>>>
>>>>>>>>> As a first step, I would volunteer to gather a list of
tests, which
>>>>>>>>> have failed in the last weeks and make sure that we have
JIRAs for
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> As a next step, we should coordinate how to resolve those
issues
>>>>>>>>> (maybe prioritized by failure frequency) to get master
stable
>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>> – Ufuk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Apr 27, 2016 at 12:12 PM, Till Rohrmann <
>>>>>>>>>>
>>>>>>>>> trohrmann@apache.org>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Flink community,
>>>>>>>>>>
>>>>>>>>>> I just wanted to raise awareness that in the last
16 days there
>>>>>>>>>> was
>>>>>>>>>>
>>>>>>>>> just a
>>>>>>>
>>>>>>>> single Travis build of master which passed all tests. This
indicates
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>
>>>>>>>> we have some serious problems with our test stability or
even worse
>>>>>>>>>> a
>>>>>>>>>> problem with the master itself. Having an unstable
master makes it
>>>>>>>>>>
>>>>>>>>> really
>>>>>>>
>>>>>>>> hard to assess whether new changes actually broke something
or
>>>>>>>>>>
>>>>>>>>> whether
>>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> failing test was unrelated.
>>>>>>>>>>
>>>>>>>>>> We have currently 37 open issues labeled with test-stability
and
>>>>>>>>>> most
>>>>>>>>>>
>>>>>>>>> of
>>>>>>>
>>>>>>>> them have a critical priority. Therefore, I would propose
that we
>>>>>>>>>> try
>>>>>>>>>>
>>>>>>>>> to
>>>>>>>
>>>>>>>> tackle them as soon as possible in order to improve our testing
>>>>>>>>>>
>>>>>>>>> stability.
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>>> Till
>>>>>>>>>>
>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message