mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-mxnet] apeforest commented on issue #17331: [mxnet 2.0] [item 2.4] Turning on large tensor support by default
Date Fri, 21 Feb 2020 20:48:23 GMT
apeforest commented on issue #17331: [mxnet 2.0] [item 2.4] Turning on large tensor support
by default
URL: https://github.com/apache/incubator-mxnet/issues/17331#issuecomment-589829033
 
 
   Thanks to @JonTanS for running the profiler, we have ping pointed the performance degradation
in operator `broadcast_axis` (from 138ms to 177ms) and `MXNDArraySyncCopyToCPU` (from 592ms
to 679ms). 
   
   Running operator-level profiler we could also identify the performance drop in `broadcast_axis`
alone.
   
   w/o USE_INT64_TENSOR_SIZE flag:
   ```[{'broadcast_axis': [{'inputs': {'data': (1, 1024, 1), 'axis': (0, 2), 'size': (1024,
8)}, 'max_storage_mem_alloc_gpu/0': 16777.2168, 'avg_time_forward_broadcast_axis': 2.7753}]}]```
   
   w/ USE_INT64_TENSOR_SIZE flag:
   ```[{'broadcast_axis': [{'inputs': {'data': (1, 1024, 1), 'axis': (0, 2), 'size': (1024,
8)}, 'max_storage_mem_alloc_gpu/0': 16777.2168, 'avg_time_forward_broadcast_axis': 6.3178}]}```
   
   Also, as I look into the implementation of broadcast_axis operator, many modulo and multiplication
operator on the indices are involved. The next step will be to find an optimal implementation
of broadcast_axis to reduce the ALU on indices in the kernel.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message