From dev-return-387-archive-asf-public=cust-asf.ponee.io@tvm.apache.org Tue May 28 17:57:17 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 545B618060F for ; Tue, 28 May 2019 19:57:17 +0200 (CEST) Received: (qmail 86942 invoked by uid 500); 28 May 2019 17:57:16 -0000 Mailing-List: contact dev-help@tvm.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tvm.apache.org Delivered-To: mailing list dev@tvm.apache.org Received: (qmail 86876 invoked by uid 99); 28 May 2019 17:57:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 May 2019 17:57:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B40CEC2863 for ; Tue, 28 May 2019 17:57:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.585 X-Spam-Level: X-Spam-Status: No, score=0.585 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.307, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=github.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 3MJnUMGktueV for ; Tue, 28 May 2019 17:57:12 +0000 (UTC) Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id ACF9660E21 for ; Tue, 28 May 2019 17:57:11 +0000 (UTC) Received: by mail-ed1-f48.google.com with SMTP id g24so5103928eds.9 for ; Tue, 28 May 2019 10:57:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:delivered-to:date:dkim-signature:from:reply-to :to:cc:message-id:subject:mime-version:content-transfer-encoding :precedence:list-id:list-archive:list-post:list-unsubscribe; bh=raX7pkiOgEeoOZs9phMEZE53MJACS4rxTOt8D1dlxko=; b=COdXb1LZiwt4THKOQn82KFK1kGcviEZmS+h9U3AlhaegIycsjYkaablaRZUuHuqEnw Mgz9rVfUQrMWpJpVMVmW8MAWo/XVdlxCqADyLwiYlGgQj8aCrO2rXyszW3tRTYUf0TOL UjqyNh7+0xGwvrBcRfW63sI3Eavj640SFH/EPCB6ygA6TkSEkWPmqWl4AjxJHF/Kvrqt v5l+iCthqu//56llcKjj1Q0cHppr/4fICwmIJfDgSzRxu5qRO8Dcy36mCKEWUctrD9Yw S5QseN213m+GqCOHi6zx3kDNGV4A4Ud6HX7/+aVZXevuMGZoeT9SQNk2S4VUOWilEG6m hYdw== X-Gm-Message-State: APjAAAWafeBKafjjtbrt1Xs4Ekz+sBi3kMWXEoHk5QQKtT8tNAKMGHGJ HBFjCNXWJyv5gd24VbAethV2hc2DbJ6t0C7/6ksrvlvF7hanhi0= X-Received: by 2002:a17:906:46ca:: with SMTP id k10mr14393846ejs.134.1559066230554; Tue, 28 May 2019 10:57:10 -0700 (PDT) X-Forwarded-To: dev@tvm.apache.org X-Forwarded-For: tvm.archiver@gmail.com dev@tvm.apache.org Delivered-To: tvm.archiver@gmail.com Received: by 2002:a50:b689:0:0:0:0:0 with SMTP id d9csp8425289ede; Tue, 28 May 2019 10:57:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqx6WtgWujfKkQmCPlw/iOYB07Q6GFSe9kYKD2doAzMlwDceJKSbQGN47H9lnQzeJ0r/Nfo2 X-Received: by 2002:a67:fd85:: with SMTP id k5mr67583144vsq.29.1559066224359; Tue, 28 May 2019 10:57:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559066224; cv=none; d=google.com; s=arc-20160816; b=k3GKlXL8N6HmpR6tmfEhBu7CaPbsh2AqxBd21TSrQ+aCec5XmsWXLT77/vNASGipbv I4EIxFTekdrpanQUUdAQAK+aGda2yE1404+nq/mD1mftR43QJyQQVWYS3gxg0M0Ti69/ W4pS0q6YE4SG6JJj1X/p2qazVGb5NT3OpCo82/pjaIQusoLx7mgZhUeCFojGHVxB7I3U v2LIkKwS6ZSyYFm52SwMPEuaycGHBhcQsZ4ecQRdUY4ZnPA+o2FCBavabblsb5VfdABn ja3oyEUVVAGmU4viNvsde0AVBG5bbVzO3pAvbG5MaUq3+v2Ua8wHCbiwf7SoHmRPgGTf CdiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-post:list-archive:list-id:precedence :content-transfer-encoding:mime-version:subject:message-id:cc:to :reply-to:from:dkim-signature:date; bh=raX7pkiOgEeoOZs9phMEZE53MJACS4rxTOt8D1dlxko=; b=MixcQmJF5Lcckncxnt+WGCO+HaA6xlRtQ5qAfi1In6O5Y8APP5e1ILRcKiVmq4JxMm IaA0RkRPSzuFXSmjWvjKJKw+9yVkJkcGJMqXq3bQVz57g2tx2A6iO+vu9JKIdr+doYb3 vtUSrUXfJO5nN70ll3oEjyNawgChGMsc+CYNHvVU24FByZn+N1EPBJvxDMYaZ9Hphahq lh94kAR2FcKPhTsoH/nUtYrB2zsgQe/7ma2vct+HzmWf37cvWNVUcQhMXn3CYn/CY+Zh vWJBbWW8ls20RpPiWirDZVbmUybdm5vsAf9lFwAnbZha4CM1uUaebogfE5w0ugIW45xY Jaog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@github.com header.s=pf2014 header.b=xqsllHqO; spf=pass (google.com: domain of noreply@github.com designates 192.30.252.196 as permitted sender) smtp.mailfrom=noreply@github.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=github.com Received: from out-5.smtp.github.com (out-5.smtp.github.com. [192.30.252.196]) by mx.google.com with ESMTPS id h21si5752588uan.175.2019.05.28.10.57.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 28 May 2019 10:57:04 -0700 (PDT) Received-SPF: pass (google.com: domain of noreply@github.com designates 192.30.252.196 as permitted sender) client-ip=192.30.252.196; Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@github.com header.s=pf2014 header.b=xqsllHqO; spf=pass (google.com: domain of noreply@github.com designates 192.30.252.196 as permitted sender) smtp.mailfrom=noreply@github.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=github.com Date: Tue, 28 May 2019 10:57:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=pf2014; t=1559066224; bh=raX7pkiOgEeoOZs9phMEZE53MJACS4rxTOt8D1dlxko=; h=Date:From:Reply-To:To:Cc:Subject:List-ID:List-Archive:List-Post: List-Unsubscribe:From; b=xqsllHqOjue/aHx/4ViBdMRD3xiStncf8j9I80iBlZBGlcq4y+rz37i5afBdIs2YD Ho96OqZB0cxCulwnYKDYynGf69F3Ov33ntXIYNt0JdfRFLDdG9yFMTjzBW2JVtlEYl y/pvIxw6YeUjz6fXb3P7sXc0OXq6dcdN/kmo8wDA= From: Animesh Jain Reply-To: dmlc/tvm To: dmlc/tvm Cc: Subscribed Message-ID: Subject: [dmlc/tvm] [RFC] Reading quantized models from TFLite and MxNet - operators API (#3252) Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="--==_mimepart_5ced766ff0769_42e43fd12a4cd96c2527be"; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GitHub-Sender: anijain2305 X-GitHub-Recipient: tvm-archiver X-GitHub-Reason: subscribed List-Archive: https://github.com/dmlc/tvm X-Auto-Response-Suppress: All X-GitHub-Recipient-Address: tvm.archiver@gmail.com ----==_mimepart_5ced766ff0769_42e43fd12a4cd96c2527be Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit To increase quantization support in TVM, it is necessary to support the pre-quantized models, i.e., the models that have been quantized in the framework itself (outside of Relay). In this issue, we are laying down the high-level API design for some of the quantized operators. A large portion of this is coming from the following relevant discussions. Thanks to @jackwish, @FrozenGene and @jnorwood for sharing their experiences with quantization, and also @shoubhik for helping design this RFC. * RFC [Issue](https://github.com/dmlc/tvm/issues/2351) * [Discussion](https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651) Other non-TVM related links that were used to understand quantization * GemmLowP - [Doc](https://github.com/google/gemmlowp/blob/master/doc/quantization.md) * TFlite reference [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/reference/conv.h#L101-L182) --------- **Covered frameworks for now** - TFLite and MxNet **Target network for now** - Inception V3 from TFLite. (I will create one for Mxnet) **Target platforms for now** - ARM and Intel (will create separate Issue as the project progresses) --------- **List of required operators** - quantize, quantized_conv2d, qunatized_relu, quantized_pool2d, quantized_fully_connected, quantized_concat, dequantize ------------ It will be good if we can agree on Relay ops - its inputs/outputs and the attributes. The initial proposal for the quantize, quantized_conv2d and dequantize ops is as follows (other quantized_* operators will be on the same lines as that of quantized_conv2d) ## Op quantize ```python def quantize(data, scale, zero_point, out_dtype): """ Quantize takes the scale and zero_point attributes and quantizes the FP32 input data to int8/uint8 tensor. Parameters ----------- data: FP32 tensor The input tensor in FP32. scale: FP32 scalar (An attribute of the op) The float scalar to scale the int8 values back to FP32. zero_point: Int32 zero point (An attribute of the op) The zero point of the distribution. out_dtype: String The dtype of the output. Can only be int8/uint8 Returns ------- quantized_data: int8/uint8 tensor The quantized tensor. """ ``` Key points to discuss * The scale and zero_point calculations happen outside the relay graph, i.e., the framework parsers will have to compute the scale and offset if only min and max are provided. [Reference implementation](https://github.com/tensorflow/tensorflow/blob/22e458382d3001a0cda4e594decf175f2387475e/tensorflow/lite/kernels/internal/quantization_util.h#L28-L99) in TFLite. This can also be thought as a framework parser utils where we can handle min/max, symmetric/asymmetric etc and generate the scale and zero_point as frameworks handles them. ## Op quantized_conv2d ```python def quantized_conv2d(quantized_data, quantized_kernel, input_scale, input_zero_point, kernel_scale, kernel_zero_point, output_scale, output_zero_point, out_dtype, # All the old remaining ones from conv2d strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, channels=None, kernel_size=None, data_layout="NCHW", kernel_layout="OIHW", out_layout=""): """ Quantize takes the scale and zero_point attributes and quantizes the FP32 input data to int8/uint8 tensor. The scale and zero_point calculations happen outside the relay graph, i.e., the framework parsers will have to compute the scale and offset if only min and max are provided. Parameters ----------- quantized_data: int8/uint8 tensor The quantized input tensor in int8/uint8. quantized_kernel: FP32 tensor The quantized kernel tensor in int8/uint8. input_scale: FP32 scalar (An attribute of the op) The float scalar to scale the quantized_data int8 values back to FP32. input_zero_point: Int32 zero point (An attribute of the op) The zero point of the quantized_data distribution. kernel_scale: FP32 scalar (An attribute of the op) The float scalar to scale the quantized_kernel int8 values back to FP32. kernel_zero_point: Int32 zero point (An attribute of the op) The zero point of the quantized_kernel distribution. output_scale: FP32 scalar (An attribute of the op) The output scale is set during the quantization process using training/calibration. The float scalar to scale the quantized_output int8 values back to FP32. output_zero_point: Int32 zero point (An attribute of the op) The output zero point is set during the quantization process using training/calibration. The zero point of the quantized_output distribution. out_dtype: String The dtype of the quantized_output. Can only be int8/uint8. The requantization from int32 to int8/uint8 is a part of the op compute. out_dtype: String The dtype of the output. Can only be int8/uint8 ..... Other attributes are same as before. Returns ------- quantized_output: int8/uint8 tensor The quantized tensor. """ ``` Key points to discuss further * This op has a set of computations that can be pre-computed ideally but difficult to do because fold-constant only works on Relay ops and not within a Relay op. This has been discussed in more detail in [discuss forum](https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651). * First pre-computable - The core computation has some compute with kernel (Term 2 and Term 4 in the above link) that will be the part of tvm compute. This is very hard to avoid. We need a fused compute to get the best performance. * Second pre-computable - The output scale and zero_point are used to calculate int multiplier and shifts to keep all the computations in Int domain. This computation changes for each op (e.g. concat will handle this in a different manner compared to conv). So, this computation is also kept inside quantized_conv2d op. This can be avoided by changing the API and replacing output_scale with output_multiplier and output_shift. But, this seems very specific to TFLite and one might want to handle the output_scale and output_offset in a different manner. **I am not sure about this part, so please comment.** * The op already has the requantization portion accounted for. As far as I understand, the requantization portion is just a clamp for out_dtype. (The handling of output_multiplier and output_shift, as mentioned above, is for the calculation of output quantized tensor and not for requantization). ## Op dequantize Dequantization is required while connecting a quantized operator and an FP32 operator. This might be a temporary stage where we do not have a quantized implementation of the second op. Dequantization might also be required at the end of the network to keep the output of the graph in FP32. ```python def dequantize(quantized_data, scale, zero_point, out_dtype): """ Dequantize takes the scale and zero_point attributes and dequantizes the int8/uint8 tensor to FP32 tensor. Parameters ----------- quantized_data: int8/uint8 quantized input tensor The input tensor in int8/uint8. scale: FP32 scalar (An attribute of the op) The float scalar to scale the int8 values back to FP32. zero_point: Int32 zero point (An attribute of the op) The zero point of the distribution. out_dtype: String The dtype of the output. Can only be float32. Returns ------- data: FP32 tensor The dequantized tensor. """ ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3252 ----==_mimepart_5ced766ff0769_42e43fd12a4cd96c2527be--