mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Speed up Mesos tests
Date Tue, 29 Dec 2015 22:05:21 GMT
There was a sharp increase in the test suite duration back when we added
the registrar: by default every test with a master uses replicated log
storage which involves many synchronous disk writes. We can swap out
replicated log storage for in-memory storage (already exists, just needs to
be wired up) if we want to get a broad improvement across the tests. The
reason that we didn't do this in the first place was that we wanted to be a
bit cautious when introducing the registrar, by trying to exercise log
storage across all the tests. This one is noted in MESOS-1757 but there's
no ticket cut out for it yet.

The other big win would be running the tests in parallel. This one is a big
shift from what we do today, but it's possible to do it even without
modifying the way we build the tests (for example, use a runner like
https://github.com/google/gtest-parallel to run many invocations of the
test binary, setting filters in order to divide the tests across the
processes). It's also a bit tricky to do in that we need to ensure that
certain tests (e.g. cgroup related) do not stomp on each other running in
parallel. Ideally we don't have to do this one.

Joris (cc'ed) had mentioned there may be other big wins we can get pretty
easily across the tests.

I thought I had wired up google test xml test reports into our jenkins job,
but perhaps this was lost during the move to docker. I just pushed a change
to generate the xml files, not sure yet how to expose them back from the
docker container filesystem back to jenkins for processing.

On Wed, Dec 16, 2015 at 12:48 PM, Alex Rukletsov <alex@mesosphere.com>
wrote:

> Greg, I think the "clock magic" is key to speed up most of the test, I'm
> glad you raised that point. Moreover, in case some folks haven't noticed
> that already, we have a doc describing some useful testing patterns:
> testing-patterns.md. It would be great if when working on these tickets we
> update and enrich this doc as well.
>
> Regarding MESOS-4101 — an interesting and bold idea, it would be great to
> capture pros & cons and think about potential implications or caveats it
> may bring.
>
> On Wed, Dec 16, 2015 at 8:29 PM, Neil Conway <neil.conway@gmail.com>
> wrote:
>
> > +1 on the speed-up-the-tests project!
> >
> > On Wed, Dec 16, 2015 at 10:29 AM, Greg Mann <greg@mesosphere.io> wrote:
> > > I'd like to bring up something that both Neil and Joseph mentioned to
> me
> > > recently, which could be of use when working on these slow test
> tickets.
> > > Since we have the `process::Clock` class, it's quite easy to control
> the
> > > clock manually, and doing so can both speed up tests as well as make
> them
> > > more deterministic/less flaky. While we're working on the above
> tickets,
> > I
> > > think it would be nice to look for opportunities to alter the tests
> we're
> > > touching to pause the clock and then advance it explicitly using
> > `pause()`,
> > > `settle()`, and `advance()`, rather than letting it run as usual.
> >
> > Yep -- I think eventually having the clock paused by default for tests
> > would probably be a good idea:
> >
> > https://issues.apache.org/jira/browse/MESOS-4101
> >
> > To make that happen, we might need a few more primitives to force
> > "pending" events to be processed before manually advancing the clock.
> > `Clock::settle()` works for libprocess messages, but not for socket
> > communication more generally (e.g., when using the HTTP API). It would
> > help to get rid of this kludge in `Clock::settle` as well:
> >
> > https://issues.apache.org/jira/browse/MESOS-3760
> >
> > Neil
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message