mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh Subramanian <anirudh2...@gmail.com>
Subject Re: Proposal for Conversion from FP32 to Mixed Precision Models
Date Mon, 29 Apr 2019 21:22:13 GMT
Hi Zach,

You raise an interesting point. Thank you for the pointer!

Incorporating CSE pass comes with its own cost, and the advantage it brings
is to make the ReducePrecision nnvm pass more lightweight. Since the
amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
whether we  add it or not from performance point of view.

>From maintenance point of view, I would agree that separating these two
logics can be helpful if we have other such workflows which require the
original Pass followed by CSE pass. Currently, as far as I know only the
ReducePrecision pass using it. I will check to see if CSE pass can benefit
other NNVM pass also like quantization pass apart from ReducePrecision, and
will get back.

Anirudh

On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg <zachary.kimberg@gmail.com>
wrote:

> I have one suggestion. In the current design, there are the additional maps
> from each input entry to each target casted entry dtype in order to avoid
> creating duplicate casts. Instead of creating these, another option is to
> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
> apply afterwards. So, you would run the mixed precision pass which creates
> the duplicates and then the CSE pass which would remove all duplicates.
>
> This design is common in existing compilers like LLVM because maintaining
> and testing the passes is much easier when they are kept as simple as
> possible. The CSE can also be reused as necessary for other passes that
> could create duplicates or to remove duplicate expressions in general. This
> tutorial [2] talks about it a bit.
>
> Zach
>
> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> [2] - https://blog.regehr.org/archives/1603
>
> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <anirudh2290@gmail.com
> >
> wrote:
>
> > Hi Tao,
> >
> > Thanks for raising this question! I thought about the existing
> quantization
> > workflow and whether it can be included with the AMP API. Although
> > quantization can be considered as mixed precision, there are differences.
> > For example, only a small number of operators can be quantized compared
> to
> > the operators that can run in FP16 precision. Thus, overriding the
> > operators to run in original dtype vs target dtype doesnt make much sense
> > for quantization.
> >
> > Also, quantization workflow may require a calibration dataset to
> calibrate
> > the min and max and calib_mode.
> > Arriving at a common API, for quantization with calibration and mixed
> > precision inference (FP16 and BF16) may make the API too complicated and
> > not very easy to use. I understand that this may cause some confusion as
> > people may try to use target_dtype of int8 but I think its still better
> > than causing user confusion with the API usage.
> >
> > Also, when we move quantize_model APIs outside contrib we can consider
> > adding them under AMP namespace. The challenge would then be to educate
> > users on difference between "quantize" and "convert".
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
> >
> > > Thank you for the explanation. Sorry I didn't realize the proposal is
> for
> > > inference only.
> > >
> > > Then how do you think the amp_cast and amp_multicast in this proposal
> can
> > > work with the existing INT8 quantization workflow which I think should
> > also
> > > be considered as 'mixed precision'.
> > >
> > > -----Original Message-----
> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > Sent: Monday, April 29, 2019 10:25 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
> Models
> > >
> > > Hi Tao,
> > >
> > > The APIs proposed: "convert_model" and "convert_block" are mainly for
> > > inference use cases, where customers bring a FP32 model to convert it
> to
> > a
> > > mixed precision model to get improved performance while not losing out
> on
> > > the accuracy.
> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
> supposed
> > > to handle the training use cases and this proposal doesn't cover the
> AMP
> > > feature added in the PR. I think ptrendx@ and canoerst@ are better
> > > equipped to answer questions 1 and 2.
> > >
> > > > - more generally, what will be saved when users want to serialize
> > > > their
> > > model to disk?
> > >
> > > Lets say users want to save converted mixed precision model used for
> > > inference to disk. It will save both, the symbol with the amp_cast and
> > > amp_multicast operators and the params (which are casted if necessary).
> > >
> > > Anirudh
> > >
> > >
> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
> > >
> > > > Thank you for sharing this, Anirudh.
> > > >
> > > > Curious to know:
> > > > - what will be saved in a training checkpoint or snapshot? Can it be
> > > > resumed on another platform which might not support the lower
> > > > precision the previous one used?
> > > > - what will be saved in the final symbol.json and params file when
> > > > training is finished?
> > > > - more generally, what will be saved when users want to serialize
> > > > their model to disk?
> > > >
> > > > Thank you,
> > > > -tao
> > > >
> > > > -----Original Message-----
> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > Sent: Monday, April 29, 2019 7:00 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> > > >
> > > > Hi all,
> > > >
> > > > I have created a doc for conversion from FP32 to Mixed Precision
> > Models:
> > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > > +to+Mixed+Precision+Models
> > > >
> > > > I look forward to your feedback on the same.
> > > >
> > > > Thanks,
> > > > Anirudh
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message