mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Przemyslaw Tredak <>
Subject Re: [apache/incubator-mxnet] [RFC] A faster version of Gamma sampling on GPU. (#15928)
Date Fri, 16 Aug 2019 15:45:25 GMT
Hi @xidulu. I did not look at the differences in the implementation of host-side vs device-side
API for RNG in MXNet, but if they are comparable in terms of performance, a possible better
approach would be something like this:
 - launch only as many blocks and threads as necessary to fill the GPU, each having their
own RNG
 - use following pseudocode
while(my_sample_id < N_samples) {
  float rng = generate_next_rng();
  bool accepted = ... // compute whether this rng value is accepted
  if (accepted) {
    // write the result
    my_sample_id = next_sample();
There are 2 ways of implementing `next_sample` here - either by `atomicInc` on some global
counter or just by adding the total number of threads (so every thread processes the same
number of samples). The atomic approach is potentially faster (as with the static assignment
you could end up hitting a corner case where 1 thread would still do a lot more work than
the other threads), but is nondeterministic, so I think static assignment is preferable here.

You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message