mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: Running tests in parallel
Date Mon, 06 Nov 2017 15:04:11 GMT
Yeah I think the issue is related to a few test fixtures setup / teardown.
When I have some more time I'll try and narrow down what's wrong with
specific tests.  There may be some tests that are / aren't reentrant.
Some tests work well, for example python3 -m nose --verbose --processes 2
test_gluon, but test_operator just starts reporting errors after 20 or so
tests.

On Mon, Nov 6, 2017 at 3:58 PM, Chris Olivier <cjolivier01@gmail.com> wrote:

> I’ve never tried that but it certainly seems like it would help CI speeds,
> especially since we don’t always use 100% CPU and almost never 100% GPU for
> tests
>
> On Mon, Nov 6, 2017 at 6:43 AM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hey all,
> >
> > Just wanted to ask before I dive too deeply on this. Does anyone know why
> > tests fail when run in multiprocess mode?  For example: python3 -m nose
> > --verbose --processes 2
> >
> > I've verified this isn't an OOM error, there should be plenty of GPU
> memory
> > on the instance I'm using.  I've also been watching nvidia-smi closely
> > during the failures.
> >
> > -Kellen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message