flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: Failing Test
Date Fri, 17 Jul 2015 09:18:21 GMT
I will open an JIRA for this. It's getting "complicated".

On 07/17/2015 11:04 AM, Till Rohrmann wrote:
> I think the problem might be related to the way the test is constructed.
> The test submits a job to the JM and then tries to poll the accumulators
> from the JM. If it does not succeed, then the polling is retried with an
> decreasing pause in between. Furthermore, the task which updates the
> accumulators also sleeps for the same period until it reads the next
> element and updates the accumulators.
> 
> Since the test does not use an explicit synchronization but instead relies
> on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
> work reliable enough, especially on Travis, to guarantee a certain thread
> interleaving. I'd recommend introducing explicit synchronization mechanism
> which control the behaviour of the accumulator producing task and explicit
> testing messages which indicate that a new accumulator value has arrived at
> the JM.
> 
> Cheers,
> Till
> 
> On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi,
>>
>> the test still fails. This time in both runs (Flink Travis and my own
>> Travis) -- only for Java 8 again:
>>
>> https://travis-ci.org/apache/flink/jobs/71314132
>> https://travis-ci.org/mjsax/flink/jobs/71179608
>>
>> -Matthias
>>
>>
>> On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
>>> Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
>>> have an eye on it in future runs.
>>>
>>> -Matthias
>>>
>>>
>>> On 07/16/2015 02:24 PM, Maximilian Michels wrote:
>>>> Hi Matthias,
>>>>
>>>> I've pushed a fix to the master. The problem should be solved. Please
>> tell
>>>> me if your Travis reports an error again. My Travis never complained :)
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels <mxm@apache.org>
>> wrote:
>>>>
>>>>> Hi Matthias,
>>>>>
>>>>> This is indeed a timing issue when checking for the results in this
>> test.
>>>>> The new accumulator implementation now continuously reports from the
>>>>> running tasks to the job manager. This was merged yesterday.
>>>>>
>>>>> The assertion that fails there is a bit strict. Actually, I've already
>>>>> integrated a retry mechanism that fails only if the assertions don't
>> hold
>>>>> for a configured number of times.
>>>>>
>>>>> I'll commit a fix to the master. Thanks for reporting!
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>> On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> this has been merged yesterday. I guess it's a timing issue when
>>>>>> verifying the results. Can you file an issue for this?
>>>>>>
>>>>>> – Ufuk
>>>>>>
>>>>>> On 16 Jul 2015, at 11:30, Matthias J. Sax <
>> mjsax@informatik.hu-berlin.de>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I hit another failing test (that is new to me):
>>>>>>>
>>>>>>>> Results :
>>>>>>>> Failed tests:
>>>>>>>>
>>>>>>
>> AccumulatorLiveITCase.testProgram:106->access$1100:68->checkFlinkAccumulators:189
>>>>>> null
>>>>>>>
>>>>>>>
>>>>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 8.694
>>>>>> sec <<< FAILURE! - in
>>>>>> org.apache.flink.test.accumulators.AccumulatorLiveITCase
>>>>>>>>
>> testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
>>>>>> Time elapsed: 8.021 sec <<< FAILURE!
>>>>>>>> java.lang.AssertionError: null
>>>>>>>> at org.junit.Assert.fail(Assert.java:86)
>>>>>>>> at org.junit.Assert.assertTrue(Assert.java:41)
>>>>>>>> at org.junit.Assert.assertTrue(Assert.java:52)
>>>>>>>> at
>>>>>>
>> org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
>>>>>>>> at
>>>>>>
>> org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
>>>>>>>
>>>>>>> Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
>>>>>>>
>>>>>>> Does anyone know anything about it?
>>>>>>>
>>>>>>> BTW: Even if this test is in flink-tests, the problem seems not
to be
>>>>>>> related to https://issues.apache.org/jira/browse/FLINK-2032 because
>>>>>>> accumulators are tested. There are not result files involved
(as fas
>> as
>>>>>>> I can tell).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


Mime
View raw message