mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Can Balioglu <>
Subject Re: Parallel Inference Proposal
Date Fri, 11 May 2018 11:32:19 GMT
Hi Kellen,

Great to see some progress on this as it is one of the major problems we face right now. Your
approach seems to be a good fit for a short-/mid-term solution. Have you also considered using
some sort of signaling? As far as I understand from your proposal and the example code, leveraging
the 'can_read' attribute requires busy waiting in the main thread. An approach similar to
Unix signals where the caller registers a handler that gets invoked when an NDArray is ready
can potentially offer greater scalability.


On Thu, May 10, 2018, at 10:42, kellen sunderland wrote:
> Hello MXNet developers,
> I’ve recently been speaking with users who’d like to run parallel inference
> requests with MXNet on their service.  They’ll do this on GPUs, and due to
> resource constraints, they’d like to do this without duplicating their
> model’s weights in memory.  They’d also like run inference with a low
> degree of buffering/batching as latency is important.  I’ve created a wiki
> page with a small proposal that I hope will make running parallel inference
> a little easier.  I’d like to discuss the proposal in this thread and would
> particularly appreciate it if core devs could correct me if I’ve made any
> incorrect assumptions in the doc.
> Proposal here:
> If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
> people are curious about perf implications I can also do some benchmarking.
> Thanks in advance for the feedback,
> -Kellen

View raw message