harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elena Semukhina" <elena.semukh...@gmail.com>
Subject Re: [general] aiming no regression
Date Tue, 19 Dec 2006 13:34:03 GMT
On 12/18/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>
>
>
> Mikhail Loenko wrote:
> > 2006/12/18, Geir Magnusson Jr. <geir@pobox.com>:
> >>
> >>
> >> Mikhail Loenko wrote:
> >> > 2006/12/1, Geir Magnusson Jr. <geir@pobox.com>:
> >> >>
> >> >>
> >> >> Mikhail Loenko wrote:
> >> >> > 4) We have cruise controls running classlibrary tests on DRLVM.
We
> >> >> > need to decide what will we do when DRLVM+Classlib cruise control
> >> >> > reports failure.
> >> >>
> >> >> Stop and fix the problem.  Is there really a question here?  I agree
> >> >
> >> > Yes, there is a question here. "Stop and fix" includes "discuss". But
> >> > as we now know discussion may take several days. And while some
> people
> >> > discuss what the problem is other people can't proceed with
> >> > development and patch
> >> > intagration.
> >> >
> >> > To have better pace and better CC up-time we need something else but
> >> not
> >> > just "stop and fix". I suggest "revert and continue"
> >>
> >> What's the difference, other than debating the semantics of "fix" and
> >> "revert"?
> >>
> >> We all agree - but I still don't think you're clearly stating the
> >> problem.  I think that the core problem is that we don't immediately
> >> react to CC failure.
> >>
> >> Immediately reacting to CC failure should be the first order of the day
> >> here.  Reacting to me is making the decision, quickly, about either
> >> rolling back the change ("reverting") or doing something else.  The key
> >> is being responsive.
> >>
> >> It seems that what happens is that we wait, and then sets of changes
> >> pile up, and I think that doing mass rollbacks at that point will solve
> >> it, but make a mess.
> >>
> >> The example of what I envision is when I broke the build in DRLVM,
> >> Gregory told me immediately, and I fixed immediately - w/o a rollback.
> >>
> >>
> >> All I'm saying is :
> >>
> >> 1) We need to be far better with reaction time
> >
> > I would say we need to be far better with fixing/reverting time.
> > If we reacted immediately and than discussed for two weeks -- we would
> not
> > be better than where we are now
>
> Yes, fixing/reverting is included. It's what I meant.
>
> >
> >>
> >> 2) We have intelligent people - we can be agile in this by making
> >> decisions (quickly!) on a case by case basis what to do.
> >>
> >> I'll also suggest that we ask each committer to check the CC event
> >> stream before committing, so you don't commit into a bad state of
> things.
> >>
> >> One of my problems is that I don't trust the CC stream, and don't
> >> clearly see it because it's mixed in the other drek of the commits@
> list.
> >
> > The problem is intermittent failures. I suggest that we exclude graphics
> > tests
> > from CCs and probably have CC-specific exclude lists for networking
> tests
> > (or fix all the known intermittent failures right now :)
>
> good idea - works for me.
>
> We need to drive into stability - we've made amazing progress in the
> last two months, and now we're down to the really, really hard stuff.  I
> think that excluding them to get rock-solid CC reporting is step 0,
> and then step 1 is try and grind out the intermittent failures.



I continued to gather statistics on intermittent failures of smoke tests for
the last few days. Although CC did not report any failures today, I'd like
to suggest excluding some tests which failed for me intermittently starting
from December 10 so we could fully trust CC reports.

They are

stress.Sync (is already excluded for some platforms)
gc.LOS (hangs on Linux ia32 JIT rather rare)
gc.MultiThreads (hangs somtimes on Windows JIT)

Do we agree on doing that?

Elena

geir
>
>


-- 
Thanks,
Elena

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message