flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Failing Test
Date Fri, 17 Jul 2015 09:04:55 GMT
I think the problem might be related to the way the test is constructed.
The test submits a job to the JM and then tries to poll the accumulators
from the JM. If it does not succeed, then the polling is retried with an
decreasing pause in between. Furthermore, the task which updates the
accumulators also sleeps for the same period until it reads the next
element and updates the accumulators.

Since the test does not use an explicit synchronization but instead relies
on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
work reliable enough, especially on Travis, to guarantee a certain thread
interleaving. I'd recommend introducing explicit synchronization mechanism
which control the behaviour of the accumulator producing task and explicit
testing messages which indicate that a new accumulator value has arrived at
the JM.

Cheers,
Till

On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> the test still fails. This time in both runs (Flink Travis and my own
> Travis) -- only for Java 8 again:
>
> https://travis-ci.org/apache/flink/jobs/71314132
> https://travis-ci.org/mjsax/flink/jobs/71179608
>
> -Matthias
>
>
> On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
> > Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
> > have an eye on it in future runs.
> >
> > -Matthias
> >
> >
> > On 07/16/2015 02:24 PM, Maximilian Michels wrote:
> >> Hi Matthias,
> >>
> >> I've pushed a fix to the master. The problem should be solved. Please
> tell
> >> me if your Travis reports an error again. My Travis never complained :)
> >>
> >> Cheers,
> >> Max
> >>
> >> On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels <mxm@apache.org>
> wrote:
> >>
> >>> Hi Matthias,
> >>>
> >>> This is indeed a timing issue when checking for the results in this
> test.
> >>> The new accumulator implementation now continuously reports from the
> >>> running tasks to the job manager. This was merged yesterday.
> >>>
> >>> The assertion that fails there is a bit strict. Actually, I've already
> >>> integrated a retry mechanism that fails only if the assertions don't
> hold
> >>> for a configured number of times.
> >>>
> >>> I'll commit a fix to the master. Thanks for reporting!
> >>>
> >>> Cheers,
> >>> Max
> >>>
> >>> On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi <uce@apache.org> wrote:
> >>>
> >>>> Hey,
> >>>>
> >>>> this has been merged yesterday. I guess it's a timing issue when
> >>>> verifying the results. Can you file an issue for this?
> >>>>
> >>>> – Ufuk
> >>>>
> >>>> On 16 Jul 2015, at 11:30, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I hit another failing test (that is new to me):
> >>>>>
> >>>>>> Results :
> >>>>>> Failed tests:
> >>>>>>
> >>>>
> AccumulatorLiveITCase.testProgram:106->access$1100:68->checkFlinkAccumulators:189
> >>>> null
> >>>>>
> >>>>>
> >>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> 8.694
> >>>> sec <<< FAILURE! - in
> >>>> org.apache.flink.test.accumulators.AccumulatorLiveITCase
> >>>>>>
> testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
> >>>> Time elapsed: 8.021 sec <<< FAILURE!
> >>>>>> java.lang.AssertionError: null
> >>>>>> at org.junit.Assert.fail(Assert.java:86)
> >>>>>> at org.junit.Assert.assertTrue(Assert.java:41)
> >>>>>> at org.junit.Assert.assertTrue(Assert.java:52)
> >>>>>> at
> >>>>
> org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
> >>>>>> at
> >>>>
> org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
> >>>>>
> >>>>> Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
> >>>>>
> >>>>> Does anyone know anything about it?
> >>>>>
> >>>>> BTW: Even if this test is in flink-tests, the problem seems not
to be
> >>>>> related to https://issues.apache.org/jira/browse/FLINK-2032 because
> >>>>> accumulators are tested. There are not result files involved (as
fas
> as
> >>>>> I can tell).
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message