flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Failing Test
Date Fri, 17 Jul 2015 13:30:56 GMT
Thanks Matthias for overlooking the issue.

Thank you Till for the problem formulation and the suggested steps for
solving the synchronization problem. I will look into this as soon as
possible.

Cheers,
Max

On Fri, Jul 17, 2015 at 11:18 AM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> I will open an JIRA for this. It's getting "complicated".
>
> On 07/17/2015 11:04 AM, Till Rohrmann wrote:
> > I think the problem might be related to the way the test is constructed.
> > The test submits a job to the JM and then tries to poll the accumulators
> > from the JM. If it does not succeed, then the polling is retried with an
> > decreasing pause in between. Furthermore, the task which updates the
> > accumulators also sleeps for the same period until it reads the next
> > element and updates the accumulators.
> >
> > Since the test does not use an explicit synchronization but instead
> relies
> > on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
> > work reliable enough, especially on Travis, to guarantee a certain thread
> > interleaving. I'd recommend introducing explicit synchronization
> mechanism
> > which control the behaviour of the accumulator producing task and
> explicit
> > testing messages which indicate that a new accumulator value has arrived
> at
> > the JM.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Hi,
> >>
> >> the test still fails. This time in both runs (Flink Travis and my own
> >> Travis) -- only for Java 8 again:
> >>
> >> https://travis-ci.org/apache/flink/jobs/71314132
> >> https://travis-ci.org/mjsax/flink/jobs/71179608
> >>
> >> -Matthias
> >>
> >>
> >> On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
> >>> Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
> >>> have an eye on it in future runs.
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 07/16/2015 02:24 PM, Maximilian Michels wrote:
> >>>> Hi Matthias,
> >>>>
> >>>> I've pushed a fix to the master. The problem should be solved. Please
> >> tell
> >>>> me if your Travis reports an error again. My Travis never complained
> :)
> >>>>
> >>>> Cheers,
> >>>> Max
> >>>>
> >>>> On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels <mxm@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Matthias,
> >>>>>
> >>>>> This is indeed a timing issue when checking for the results in this
> >> test.
> >>>>> The new accumulator implementation now continuously reports from
the
> >>>>> running tasks to the job manager. This was merged yesterday.
> >>>>>
> >>>>> The assertion that fails there is a bit strict. Actually, I've
> already
> >>>>> integrated a retry mechanism that fails only if the assertions don't
> >> hold
> >>>>> for a configured number of times.
> >>>>>
> >>>>> I'll commit a fix to the master. Thanks for reporting!
> >>>>>
> >>>>> Cheers,
> >>>>> Max
> >>>>>
> >>>>> On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi <uce@apache.org>
> wrote:
> >>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> this has been merged yesterday. I guess it's a timing issue
when
> >>>>>> verifying the results. Can you file an issue for this?
> >>>>>>
> >>>>>> – Ufuk
> >>>>>>
> >>>>>> On 16 Jul 2015, at 11:30, Matthias J. Sax <
> >> mjsax@informatik.hu-berlin.de>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I hit another failing test (that is new to me):
> >>>>>>>
> >>>>>>>> Results :
> >>>>>>>> Failed tests:
> >>>>>>>>
> >>>>>>
> >>
> AccumulatorLiveITCase.testProgram:106->access$1100:68->checkFlinkAccumulators:189
> >>>>>> null
> >>>>>>>
> >>>>>>>
> >>>>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed:
> >> 8.694
> >>>>>> sec <<< FAILURE! - in
> >>>>>> org.apache.flink.test.accumulators.AccumulatorLiveITCase
> >>>>>>>>
> >> testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
> >>>>>> Time elapsed: 8.021 sec <<< FAILURE!
> >>>>>>>> java.lang.AssertionError: null
> >>>>>>>> at org.junit.Assert.fail(Assert.java:86)
> >>>>>>>> at org.junit.Assert.assertTrue(Assert.java:41)
> >>>>>>>> at org.junit.Assert.assertTrue(Assert.java:52)
> >>>>>>>> at
> >>>>>>
> >>
> org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
> >>>>>>>> at
> >>>>>>
> >>
> org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
> >>>>>>>
> >>>>>>> Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
> >>>>>>>
> >>>>>>> Does anyone know anything about it?
> >>>>>>>
> >>>>>>> BTW: Even if this test is in flink-tests, the problem seems
not to
> be
> >>>>>>> related to https://issues.apache.org/jira/browse/FLINK-2032
> because
> >>>>>>> accumulators are tested. There are not result files involved
(as
> fas
> >> as
> >>>>>>> I can tell).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message