mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <>
Subject Parallel Inference Proposal
Date Thu, 10 May 2018 14:42:36 GMT
Hello MXNet developers,

I’ve recently been speaking with users who’d like to run parallel inference
requests with MXNet on their service.  They’ll do this on GPUs, and due to
resource constraints, they’d like to do this without duplicating their
model’s weights in memory.  They’d also like run inference with a low
degree of buffering/batching as latency is important.  I’ve created a wiki
page with a small proposal that I hope will make running parallel inference
a little easier.  I’d like to discuss the proposal in this thread and would
particularly appreciate it if core devs could correct me if I’ve made any
incorrect assumptions in the doc.

Proposal here:

If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
people are curious about perf implications I can also do some benchmarking.

Thanks in advance for the feedback,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message