mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Kimberg <zachary.kimb...@gmail.com>
Subject Re: Proposal for Conversion from FP32 to Mixed Precision Models
Date Mon, 29 Apr 2019 18:18:26 GMT
I have one suggestion. In the current design, there are the additional maps
from each input entry to each target casted entry dtype in order to avoid
creating duplicate casts. Instead of creating these, another option is to
use a general purpose Common Subexpression Elimination (CSE) [1] pass to
apply afterwards. So, you would run the mixed precision pass which creates
the duplicates and then the CSE pass which would remove all duplicates.

This design is common in existing compilers like LLVM because maintaining
and testing the passes is much easier when they are kept as simple as
possible. The CSE can also be reused as necessary for other passes that
could create duplicates or to remove duplicate expressions in general. This
tutorial [2] talks about it a bit.

Zach

[1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
[2] - https://blog.regehr.org/archives/1603

On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <anirudh2290@gmail.com>
wrote:

> Hi Tao,
>
> Thanks for raising this question! I thought about the existing quantization
> workflow and whether it can be included with the AMP API. Although
> quantization can be considered as mixed precision, there are differences.
> For example, only a small number of operators can be quantized compared to
> the operators that can run in FP16 precision. Thus, overriding the
> operators to run in original dtype vs target dtype doesnt make much sense
> for quantization.
>
> Also, quantization workflow may require a calibration dataset to calibrate
> the min and max and calib_mode.
> Arriving at a common API, for quantization with calibration and mixed
> precision inference (FP16 and BF16) may make the API too complicated and
> not very easy to use. I understand that this may cause some confusion as
> people may try to use target_dtype of int8 but I think its still better
> than causing user confusion with the API usage.
>
> Also, when we move quantize_model APIs outside contrib we can consider
> adding them under AMP namespace. The challenge would then be to educate
> users on difference between "quantize" and "convert".
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
>
> > Thank you for the explanation. Sorry I didn't realize the proposal is for
> > inference only.
> >
> > Then how do you think the amp_cast and amp_multicast in this proposal can
> > work with the existing INT8 quantization workflow which I think should
> also
> > be considered as 'mixed precision'.
> >
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Monday, April 29, 2019 10:25 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
> >
> > Hi Tao,
> >
> > The APIs proposed: "convert_model" and "convert_block" are mainly for
> > inference use cases, where customers bring a FP32 model to convert it to
> a
> > mixed precision model to get improved performance while not losing out on
> > the accuracy.
> > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed
> > to handle the training use cases and this proposal doesn't cover the AMP
> > feature added in the PR. I think ptrendx@ and canoerst@ are better
> > equipped to answer questions 1 and 2.
> >
> > > - more generally, what will be saved when users want to serialize
> > > their
> > model to disk?
> >
> > Lets say users want to save converted mixed precision model used for
> > inference to disk. It will save both, the symbol with the amp_cast and
> > amp_multicast operators and the params (which are casted if necessary).
> >
> > Anirudh
> >
> >
> > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
> >
> > > Thank you for sharing this, Anirudh.
> > >
> > > Curious to know:
> > > - what will be saved in a training checkpoint or snapshot? Can it be
> > > resumed on another platform which might not support the lower
> > > precision the previous one used?
> > > - what will be saved in the final symbol.json and params file when
> > > training is finished?
> > > - more generally, what will be saved when users want to serialize
> > > their model to disk?
> > >
> > > Thank you,
> > > -tao
> > >
> > > -----Original Message-----
> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > Sent: Monday, April 29, 2019 7:00 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> > >
> > > Hi all,
> > >
> > > I have created a doc for conversion from FP32 to Mixed Precision
> Models:
> > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > +to+Mixed+Precision+Models
> > >
> > > I look forward to your feedback on the same.
> > >
> > > Thanks,
> > > Anirudh
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message