mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] rahul003 commented on issue #10183: [MXNET-120] Float16 support for distributed training
Date Sun, 01 Apr 2018 01:43:43 GMT
rahul003 commented on issue #10183: [MXNET-120] Float16 support for distributed training
URL: https://github.com/apache/incubator-mxnet/pull/10183#issuecomment-377736427
 
 
   Added multi-precision mode support. When optimizer's multi-precision field is True, then
server maintains weights in fp32 and casts received gradients to fp32. 
   Multi precision mode is slightly slower end-to-end[1]. In the experiments I ran it was
about 10% slower than not using multi precision. But since it might be preferable to use multiprecision
for convergence, I added it. I think there being both options is good because that makes it
similar to single machine training.
   
   [1] Using multi precision is about 10% slower than not using only when you compare end
to end. When you profile server and also compare times for push and pull, the operations are
almost 2x slower with multiprecision. CopyFromTo takes up a lot of time. I'll try updating
CopyFromTo to use Kernel Launch instead of mshadow Expr and see if that makes a difference.
But that is a different PR. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message