tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tqc...@apache.org
Subject [incubator-tvm-site] branch asf-site updated: Build at Thu May 14 10:59:43 PDT 2020
Date Thu, 14 May 2020 17:59:55 GMT
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new a8e1e78  Build at Thu May 14 10:59:43 PDT 2020
a8e1e78 is described below

commit a8e1e78ffb213a5564c99d94f564128da1d60875
Author: tqchen <tqchenml@gmail.com>
AuthorDate: Thu May 14 10:59:43 2020 -0700

    Build at Thu May 14 10:59:43 PDT 2020
---
 ...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html |  16 +-
 ...machine-learning-to-webassembly-and-webgpu.html | 283 +++++++++++++++++++++
 atom.xml                                           | 105 +++++++-
 blog.html                                          |  10 +
 images/webgpu/ml-compiler-flow.png                 | Bin 0 -> 197380 bytes
 images/webgpu/tvm-wasm-stack.png                   | Bin 0 -> 412428 bytes
 images/webgpu/webgpu-mobilenet-perf.png            | Bin 0 -> 90966 bytes
 rss.xml                                            | 107 +++++++-
 sitemap.txt                                        |   1 +
 9 files changed, 495 insertions(+), 27 deletions(-)

diff --git a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
index 07f0cb6..7d0db87 100644
--- a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
+++ b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
@@ -262,13 +262,13 @@ We are starting to look at performance optimization and we expect more improveme
 <p>You should see something like this:</p>
 
 <figure class="highlight"><pre><code class="language-llvm" data-lang="llvm"><span class="c1">; ModuleID = 'myadd__kernel0'</span>
-<span class="err">source_filename</span> <span class="p">=</span> <span class="s">"myadd__kernel0"</span>
+<span class="err">sour</span><span class="k">c</span><span class="err">e_filename</span> <span class="p">=</span> <span class="s">"myadd__kernel0"</span>
 <span class="k">target</span> <span class="k">datalayout</span> <span class="p">=</span> <span class="s">"e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"</span>
 <span class="k">target</span> <span class="k">triple</span> <span class="p">=</span> <span class="s">"amdgcn-amd-amdhsa-hcc"</span>
 
 
 <span class="c1">; Function Attrs: nounwind</span>
-<span class="k">define</span> <span class="k">dllexport</span> <span class="err">amdgpu_kernel</span> <span class="kt">void</span> <span class="vg">@myadd__kernel0</span><span class="p">(</span><span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="k">noalias</span> <span class="k">nocapture</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class= [...]
+<span class="k">define</span> <span class="k">dllexport</span> <span class="err">amdgpu_ker</span><span class="k">ne</span><span class="err">l</span> <span class="kt">void</span> <span class="vg">@myadd__kernel0</span><span class="p">(</span><span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="k">noalias</span> <span clas [...]
 <span class="nl">entry:</span>
   <span class="nv">%4</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workgroup.id.x</span><span class="p">()</span>
   <span class="nv">%5</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workitem.id.x</span><span class="p">()</span>
@@ -288,14 +288,14 @@ We are starting to look at performance optimization and we expect more improveme
   <span class="nv">%10</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
   <span class="nv">%11</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
   <span class="nv">%12</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%11</span> <span class="k">to</span> <span class="kt">i64</span>
-  <span class="nv">%13</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%2</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
-  <span class="nv">%14</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%13</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!2</span>
-  <span class="nv">%15</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%1</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
-  <span class="nv">%16</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%15</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!6</span>
+  <span class="nv">%13</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%2</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%14</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%13</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
+  <span class="nv">%15</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%1</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%16</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%15</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
   <span class="nv">%17</span> <span class="p">=</span> <span class="k">fadd</span> <span class="kt">float</span> <span class="nv">%14</span><span class="p">,</span> <span class="nv">%16</span>
   <span class="nv">%18</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%10</span> <span class="k">to</span> <span class="kt">i64</span>
-  <span class="nv">%19</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%18</span>
-  <span class="k">store</span> <span class="kt">float</span> <span class="nv">%17</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%19</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!9</span>
+  <span class="nv">%19</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%18</span>
+  <span class="k">store</span> <span class="kt">float</span> <span class="nv">%17</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%19</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span clas [...]
   <span class="k">br</span> <span class="kt">label</span> <span class="nv">%if_end</span>
 
 
diff --git a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
new file mode 100644
index 0000000..c51e923
--- /dev/null
+++ b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
@@ -0,0 +1,283 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Compiling Machine Learning to WASM and WebGPU with Apache TVM </h1>
+      <p class="post-meta">
+        <time datetime="2020-05-14T00:00:00-07:00" itemprop="datePublished">
+          May 14, 2020
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Tianqi Chen and Jared Roesch, OctoML</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p><strong>TLDR</strong></p>
+
+<p>We introduced support for WASM and WebGPU to the Apache TVM deep learning compiler. Our experiments shows that  TVM’s WebGPU backend can get <strong>close to native</strong> <strong>GPU performance</strong> when deploying models to the web.</p>
+
+<p style="text-align: center"><img src="/images/webgpu/webgpu-mobilenet-perf.png" alt="image" width="55%" /><br /></p>
+
+<h2 id="introduction">Introduction</h2>
+
+<p>Computing is one of the pillars of modern machine learning applications. The introduction of the GPU to accelerate deep learning workloads has increased the rate of progress dramatically. Given the growing requirement to deploy machine learning everywhere, the browser becomes a natural place to deploy intelligent applications.</p>
+
+<p>While TensorFlow.js and ONNX.js are existing efforts to bring machine learning to the browser, there still exist non-trivial gaps in performance between the web versions and native ones. One of the many reasons is the lack of standard and performant access to the GPU on the web. WebGL lacks important features such as compute shaders and generic storage buffers that are necessary for high performance deep learning.</p>
+
+<p>WebGPU is the upcoming standard for next generation web graphics which has the possibility to dramatically change this situation. Like the latest generation graphics APIs such as Vulkan and Metal, WebGPU offers first-class compute shader support.</p>
+
+<p>To explore the potential of using WebGPU for machine learning deployment in the browser, we enhanced the deep learning compiler Apache(incubating) TVM to target WASM (for host code that computes the launching parameters and calls into the device launch) and WebGPU (for device execution). Our preliminary results are quite positive — for the first time, we can deploy machine learning applications on the web while still getting near native performance on the GPU.</p>
+
+<h2 id="machine-learning-compiler">Machine Learning Compiler</h2>
+
+<p style="text-align: center"><img src="/images/webgpu/ml-compiler-flow.png" alt="image" width="65%" /><br /></p>
+
+<p>One natural reaction when trying out WebGPU is to write shaders for primitive operators in deep neural networks (matrix multiplication and convolution) and then directly optimize their performance. This is the traditional workflow used  by existing frameworks such as TensorFlow.js.</p>
+
+<p>Instead, we apply a compilation based approach. TVM automatically ingests models from high-level frameworks such as TensorFlow, Keras, PyTorch, MXNet and ONNX and uses a machine learning driven approach to automatically generate low level code, in this case compute shaders in SPIR-V format. The generated code can then be packaged as a deployable module.</p>
+
+<p>One important advantage of the compilation based approach is the reuse of infrastructure. We are able to effortlessly (relative to <a href="https://arxiv.org/abs/1901.05350">other approaches</a>) target the web by reusing the infrastructure for optimizing GPU kernels for native platforms such as CUDA, Metal and OpenCL. If the mapping of the WebGPU API to native APIs is efficient we can expect similar performance with very little work. More importantly, the <a href="https://tvm.apache. [...]
+
+<h2 id="building-a-wasm-and-webgpu-compiler">Building a WASM and WebGPU Compiler</h2>
+
+<p>In order to build a compiler that can target WASM and WebGPU, we need the following elements:</p>
+
+<ul>
+  <li>A SPIR-V generator for compute shaders.</li>
+  <li>A WASM generator for the host program.</li>
+  <li>A runtime to load and execute the generated program.</li>
+</ul>
+
+<p>Luckily, TVM already has a SPIR-V target for Vulkan, and uses LLVM for host code generation. So we can just repurpose the two to generate the device and host programs.</p>
+
+<p>The main challenge is the runtime. We need a runtime to load the shader code, and to enable  the host code talk to communicate with the shader correctly. TVM has a minimum C++ based runtime. We build a minimum web runtime library and link it with the generated shader and host driving code, producing a single WASM file. However, this WASM module still contains two unknown dependencies:</p>
+
+<ul>
+  <li>The runtime needs to call into system library calls (malloc, stderr).</li>
+  <li>The wasm runtime needs to interact with the WebGPU driver (in javascript where the WebGPU API is the first-class citizen).</li>
+</ul>
+
+<p>WASI is a standard solution to solve the first problem. While there is not yet a mature WASI on the web, we can use emscripten to generate a WASI-like library (see discussion <a href="https://github.com/emscripten-core/emscripten/issues/11075">here</a>) to provide these system libraries.</p>
+
+<p>We solve the second problem by building a WebGPU runtime inside TVM’s JS runtime, and calling back to these functions from the WASM module when invoking GPU code. Using the <a href="https://tvm.apache.org/docs/dev/runtime.html#packedfunc">PackedFunc</a> mechanism in TVM’s runtime system, we can directly expose high-level runtime primitives by passing JavaScript closures to the WASM interface. This approach keeps most of the runtime code in JavaScript, we could bring more JS code into  [...]
+
+<p style="text-align: center"><img src="/images/webgpu/tvm-wasm-stack.png" alt="image" width="65%" /></p>
+
+<h2 id="performance">Performance</h2>
+
+<p style="text-align: center"><img src="/images/webgpu/webgpu-mobilenet-perf.png" alt="image" width="65%" /></p>
+
+<p>We ran a quick experiment comparing the execution of a full computational graph via TVM’s WebGPU backend and native targets that use native GPU runtimes (Metal and OpenCL). On the MobileNet model, we can find that the WebGPU can get close to matching the performance of Metal. Assuming Chrome WebGPU’s runtime targets Metal instead of OpenCL on the MacOS, we can safely assume there is little to no performance loss when targeting the GPU.</p>
+
+<p>This benchmark excludes the CPU to GPU data copy cost and only benchmarks the GPU execution. Currently the data copy from CPU to GPU can still take 25% of the execution time; however, these costs can further be amortized via approaches like double buffering in a continuous execution setting.</p>
+
+<p>Our reported end-to-end running time of mobilenet is by no means optimal, since we simply reused a tuned programs from GTX 1080 Ti, which is very different from the Intel graphics GPU. We expect further performance boost by using <a href="https://tvm.apache.org/2018/10/03/auto-opt-all">AutoTVM</a> on the target platform of interest.</p>
+
+<h2 id="looking-to-the-future">Looking to the Future</h2>
+
+<p>Our results suggest many interesting opportunities for machine learning on the web. Notably, WebGPU is an API that is still evolving and its implications could go beyond web applications. For example one could target native APIs of WebGPU as it matures and becomes standardized through WASI, enabling standalone WASM applications that make use of WebGPU.</p>
+
+<p>The TVM community is also actively working on a <a href="https://github.com/apache/incubator-tvm/tree/master/rust">Rust based runtime</a> that would enable much more robust WASM support and enable easier interaction with projects like <a href="https://github.com/gfx-rs/wgpu-rs">wgpu</a>, and the <a href="https://rustwasm.github.io/docs/book/">Rust WASM</a> ecosystem. As an open source project, we are looking for contributors who can bring in new ideas and help push the project in thes [...]
+
+<p>The proposed approach provides effective machine learning support for most WASM’s application scenarios. The close to native performance could unlock better <a href="https://en.wikipedia.org/wiki/Federated_learning">federated learning</a> capabilities on the browser. The same compiled package should also be able to run on native WASM executors to provide sandbox for the applications.</p>
+
+<h2 id="show-me-the-code">Show me the Code</h2>
+
+<ul>
+  <li><a href="https://github.com/tqchen/tvm-webgpu-example">Example project for image classification</a></li>
+  <li><a href="https://github.com/apache/incubator-tvm/tree/master/web">Apache TVM on github</a></li>
+</ul>
+
+<h2 id="acknowledgement">Acknowledgement</h2>
+
+<p>We would like to thank the emscripten project for providing the WASM compilation infrastructures as well as the JS library support on the web. We would also like to thank the WebGPU community for various helpful discussions. Thanks to Fletcher Haynes for valuable feedbacks to the post.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+
+
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
+</html>
+
+
diff --git a/atom.xml b/atom.xml
index 6284eab..204f5cf 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
  <title>TVM</title>
  <link href="https://tvm.apache.org" rel="self"/>
  <link href="https://tvm.apache.org"/>
- <updated>2020-05-04T07:34:25-07:00</updated>
+ <updated>2020-05-14T10:59:40-07:00</updated>
  <id>https://tvm.apache.org</id>
  <author>
    <name></name>
@@ -13,6 +13,93 @@
 
  
  <entry>
+   <title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
+   <link href="https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu"/>
+   <updated>2020-05-14T00:00:00-07:00</updated>
+   <id>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</id>
+   <content type="html">&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;We introduced support for WASM and WebGPU to the Apache TVM deep learning compiler. Our experiments shows that  TVM’s WebGPU backend can get &lt;strong&gt;close to native&lt;/strong&gt; &lt;strong&gt;GPU performance&lt;/strong&gt; when deploying models to the web.&lt;/p&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/webgpu-mobilenet-perf.png&quot; alt=&quot;image&quot; width=&quot;55%&quot; /&gt;&lt;br /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;p&gt;Computing is one of the pillars of modern machine learning applications. The introduction of the GPU to accelerate deep learning workloads has increased the rate of progress dramatically. Given the growing requirement to deploy machine learning everywhere, the browser becomes a natural place to deploy intelligent applications.&lt;/p&gt;
+
+&lt;p&gt;While TensorFlow.js and ONNX.js are existing efforts to bring machine learning to the browser, there still exist non-trivial gaps in performance between the web versions and native ones. One of the many reasons is the lack of standard and performant access to the GPU on the web. WebGL lacks important features such as compute shaders and generic storage buffers that are necessary for high performance deep learning.&lt;/p&gt;
+
+&lt;p&gt;WebGPU is the upcoming standard for next generation web graphics which has the possibility to dramatically change this situation. Like the latest generation graphics APIs such as Vulkan and Metal, WebGPU offers first-class compute shader support.&lt;/p&gt;
+
+&lt;p&gt;To explore the potential of using WebGPU for machine learning deployment in the browser, we enhanced the deep learning compiler Apache(incubating) TVM to target WASM (for host code that computes the launching parameters and calls into the device launch) and WebGPU (for device execution). Our preliminary results are quite positive — for the first time, we can deploy machine learning applications on the web while still getting near native performance on the GPU.&lt;/p&gt;
+
+&lt;h2 id=&quot;machine-learning-compiler&quot;&gt;Machine Learning Compiler&lt;/h2&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/ml-compiler-flow.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;br /&gt;&lt;/p&gt;
+
+&lt;p&gt;One natural reaction when trying out WebGPU is to write shaders for primitive operators in deep neural networks (matrix multiplication and convolution) and then directly optimize their performance. This is the traditional workflow used  by existing frameworks such as TensorFlow.js.&lt;/p&gt;
+
+&lt;p&gt;Instead, we apply a compilation based approach. TVM automatically ingests models from high-level frameworks such as TensorFlow, Keras, PyTorch, MXNet and ONNX and uses a machine learning driven approach to automatically generate low level code, in this case compute shaders in SPIR-V format. The generated code can then be packaged as a deployable module.&lt;/p&gt;
+
+&lt;p&gt;One important advantage of the compilation based approach is the reuse of infrastructure. We are able to effortlessly (relative to &lt;a href=&quot;https://arxiv.org/abs/1901.05350&quot;&gt;other approaches&lt;/a&gt;) target the web by reusing the infrastructure for optimizing GPU kernels for native platforms such as CUDA, Metal and OpenCL. If the mapping of the WebGPU API to native APIs is efficient we can expect similar performance with very little work. More importantly, the  [...]
+
+&lt;h2 id=&quot;building-a-wasm-and-webgpu-compiler&quot;&gt;Building a WASM and WebGPU Compiler&lt;/h2&gt;
+
+&lt;p&gt;In order to build a compiler that can target WASM and WebGPU, we need the following elements:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A SPIR-V generator for compute shaders.&lt;/li&gt;
+  &lt;li&gt;A WASM generator for the host program.&lt;/li&gt;
+  &lt;li&gt;A runtime to load and execute the generated program.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Luckily, TVM already has a SPIR-V target for Vulkan, and uses LLVM for host code generation. So we can just repurpose the two to generate the device and host programs.&lt;/p&gt;
+
+&lt;p&gt;The main challenge is the runtime. We need a runtime to load the shader code, and to enable  the host code talk to communicate with the shader correctly. TVM has a minimum C++ based runtime. We build a minimum web runtime library and link it with the generated shader and host driving code, producing a single WASM file. However, this WASM module still contains two unknown dependencies:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The runtime needs to call into system library calls (malloc, stderr).&lt;/li&gt;
+  &lt;li&gt;The wasm runtime needs to interact with the WebGPU driver (in javascript where the WebGPU API is the first-class citizen).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;WASI is a standard solution to solve the first problem. While there is not yet a mature WASI on the web, we can use emscripten to generate a WASI-like library (see discussion &lt;a href=&quot;https://github.com/emscripten-core/emscripten/issues/11075&quot;&gt;here&lt;/a&gt;) to provide these system libraries.&lt;/p&gt;
+
+&lt;p&gt;We solve the second problem by building a WebGPU runtime inside TVM’s JS runtime, and calling back to these functions from the WASM module when invoking GPU code. Using the &lt;a href=&quot;https://tvm.apache.org/docs/dev/runtime.html#packedfunc&quot;&gt;PackedFunc&lt;/a&gt; mechanism in TVM’s runtime system, we can directly expose high-level runtime primitives by passing JavaScript closures to the WASM interface. This approach keeps most of the runtime code in JavaScript, we co [...]
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/tvm-wasm-stack.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/webgpu-mobilenet-perf.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;We ran a quick experiment comparing the execution of a full computational graph via TVM’s WebGPU backend and native targets that use native GPU runtimes (Metal and OpenCL). On the MobileNet model, we can find that the WebGPU can get close to matching the performance of Metal. Assuming Chrome WebGPU’s runtime targets Metal instead of OpenCL on the MacOS, we can safely assume there is little to no performance loss when targeting the GPU.&lt;/p&gt;
+
+&lt;p&gt;This benchmark excludes the CPU to GPU data copy cost and only benchmarks the GPU execution. Currently the data copy from CPU to GPU can still take 25% of the execution time; however, these costs can further be amortized via approaches like double buffering in a continuous execution setting.&lt;/p&gt;
+
+&lt;p&gt;Our reported end-to-end running time of mobilenet is by no means optimal, since we simply reused a tuned programs from GTX 1080 Ti, which is very different from the Intel graphics GPU. We expect further performance boost by using &lt;a href=&quot;https://tvm.apache.org/2018/10/03/auto-opt-all&quot;&gt;AutoTVM&lt;/a&gt; on the target platform of interest.&lt;/p&gt;
+
+&lt;h2 id=&quot;looking-to-the-future&quot;&gt;Looking to the Future&lt;/h2&gt;
+
+&lt;p&gt;Our results suggest many interesting opportunities for machine learning on the web. Notably, WebGPU is an API that is still evolving and its implications could go beyond web applications. For example one could target native APIs of WebGPU as it matures and becomes standardized through WASI, enabling standalone WASM applications that make use of WebGPU.&lt;/p&gt;
+
+&lt;p&gt;The TVM community is also actively working on a &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/master/rust&quot;&gt;Rust based runtime&lt;/a&gt; that would enable much more robust WASM support and enable easier interaction with projects like &lt;a href=&quot;https://github.com/gfx-rs/wgpu-rs&quot;&gt;wgpu&lt;/a&gt;, and the &lt;a href=&quot;https://rustwasm.github.io/docs/book/&quot;&gt;Rust WASM&lt;/a&gt; ecosystem. As an open source project, we are looking for c [...]
+
+&lt;p&gt;The proposed approach provides effective machine learning support for most WASM’s application scenarios. The close to native performance could unlock better &lt;a href=&quot;https://en.wikipedia.org/wiki/Federated_learning&quot;&gt;federated learning&lt;/a&gt; capabilities on the browser. The same compiled package should also be able to run on native WASM executors to provide sandbox for the applications.&lt;/p&gt;
+
+&lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the Code&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/tqchen/tvm-webgpu-example&quot;&gt;Example project for image classification&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/master/web&quot;&gt;Apache TVM on github&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;acknowledgement&quot;&gt;Acknowledgement&lt;/h2&gt;
+
+&lt;p&gt;We would like to thank the emscripten project for providing the WASM compilation infrastructures as well as the JS library support on the web. We would also like to thank the WebGPU community for various helpful discussions. Thanks to Fletcher Haynes for valuable feedbacks to the post.&lt;/p&gt;
+</content>
+ </entry>
+ 
+ <entry>
    <title>Integrating TVM into PyTorch</title>
    <link href="https://tvm.apache.org/2019/05/30/pytorch-frontend"/>
    <updated>2019-05-30T00:00:00-07:00</updated>
@@ -2696,13 +2783,13 @@ We are starting to look at performance optimization and we expect more improveme
 &lt;p&gt;You should see something like this:&lt;/p&gt;
 
 &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-llvm&quot; data-lang=&quot;llvm&quot;&gt;&lt;span class=&quot;c1&quot;&gt;; ModuleID = 'myadd__kernel0'&lt;/span&gt;
-&lt;span class=&quot;err&quot;&gt;source_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
+&lt;span class=&quot;err&quot;&gt;sour&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;datalayout&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;triple&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;amdgcn-amd-amdhsa-hcc&quot;&lt;/span&gt;
 
 
 &lt;span class=&quot;c1&quot;&gt;; Function Attrs: nounwind&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;amdgpu_kernel&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class [...]
+&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;amdgpu_ker&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ne&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;l&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k [...]
 &lt;span class=&quot;nl&quot;&gt;entry:&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workgroup.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workitem.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
@@ -2722,14 +2809,14 @@ We are starting to look at performance optimization and we expect more improveme
   &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%12&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
+  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt; [...]
+  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt; [...]
   &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%18&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%19&lt;/span [...]
+  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; [...]
   &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;
 
 
diff --git a/blog.html b/blog.html
index 46e4570..a0ba786 100644
--- a/blog.html
+++ b/blog.html
@@ -156,6 +156,16 @@
 
 <li>
   <span>
+    <a class="post-link" href="/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu">Compiling Machine Learning to WASM and WebGPU with Apache TVM</a>
+  </span>
+  </br>
+  <span>
+    May 14, 2020
+  </span>
+</li>
+
+<li>
+  <span>
     <a class="post-link" href="/2019/05/30/pytorch-frontend">Integrating TVM into PyTorch</a>
   </span>
   </br>
diff --git a/images/webgpu/ml-compiler-flow.png b/images/webgpu/ml-compiler-flow.png
new file mode 100644
index 0000000..93ee58f
Binary files /dev/null and b/images/webgpu/ml-compiler-flow.png differ
diff --git a/images/webgpu/tvm-wasm-stack.png b/images/webgpu/tvm-wasm-stack.png
new file mode 100644
index 0000000..a6033ec
Binary files /dev/null and b/images/webgpu/tvm-wasm-stack.png differ
diff --git a/images/webgpu/webgpu-mobilenet-perf.png b/images/webgpu/webgpu-mobilenet-perf.png
new file mode 100644
index 0000000..f402d09
Binary files /dev/null and b/images/webgpu/webgpu-mobilenet-perf.png differ
diff --git a/rss.xml b/rss.xml
index 507c076..2cca34c 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,12 +5,99 @@
         <description>TVM - </description>
         <link>https://tvm.apache.org</link>
         <atom:link href="https://tvm.apache.org" rel="self" type="application/rss+xml" />
-        <lastBuildDate>Mon, 04 May 2020 07:34:25 -0700</lastBuildDate>
-        <pubDate>Mon, 04 May 2020 07:34:25 -0700</pubDate>
+        <lastBuildDate>Thu, 14 May 2020 10:59:40 -0700</lastBuildDate>
+        <pubDate>Thu, 14 May 2020 10:59:40 -0700</pubDate>
         <ttl>60</ttl>
 
 
         <item>
+                <title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
+                <description>&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;We introduced support for WASM and WebGPU to the Apache TVM deep learning compiler. Our experiments shows that  TVM’s WebGPU backend can get &lt;strong&gt;close to native&lt;/strong&gt; &lt;strong&gt;GPU performance&lt;/strong&gt; when deploying models to the web.&lt;/p&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/webgpu-mobilenet-perf.png&quot; alt=&quot;image&quot; width=&quot;55%&quot; /&gt;&lt;br /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;p&gt;Computing is one of the pillars of modern machine learning applications. The introduction of the GPU to accelerate deep learning workloads has increased the rate of progress dramatically. Given the growing requirement to deploy machine learning everywhere, the browser becomes a natural place to deploy intelligent applications.&lt;/p&gt;
+
+&lt;p&gt;While TensorFlow.js and ONNX.js are existing efforts to bring machine learning to the browser, there still exist non-trivial gaps in performance between the web versions and native ones. One of the many reasons is the lack of standard and performant access to the GPU on the web. WebGL lacks important features such as compute shaders and generic storage buffers that are necessary for high performance deep learning.&lt;/p&gt;
+
+&lt;p&gt;WebGPU is the upcoming standard for next generation web graphics which has the possibility to dramatically change this situation. Like the latest generation graphics APIs such as Vulkan and Metal, WebGPU offers first-class compute shader support.&lt;/p&gt;
+
+&lt;p&gt;To explore the potential of using WebGPU for machine learning deployment in the browser, we enhanced the deep learning compiler Apache(incubating) TVM to target WASM (for host code that computes the launching parameters and calls into the device launch) and WebGPU (for device execution). Our preliminary results are quite positive — for the first time, we can deploy machine learning applications on the web while still getting near native performance on the GPU.&lt;/p&gt;
+
+&lt;h2 id=&quot;machine-learning-compiler&quot;&gt;Machine Learning Compiler&lt;/h2&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/ml-compiler-flow.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;br /&gt;&lt;/p&gt;
+
+&lt;p&gt;One natural reaction when trying out WebGPU is to write shaders for primitive operators in deep neural networks (matrix multiplication and convolution) and then directly optimize their performance. This is the traditional workflow used  by existing frameworks such as TensorFlow.js.&lt;/p&gt;
+
+&lt;p&gt;Instead, we apply a compilation based approach. TVM automatically ingests models from high-level frameworks such as TensorFlow, Keras, PyTorch, MXNet and ONNX and uses a machine learning driven approach to automatically generate low level code, in this case compute shaders in SPIR-V format. The generated code can then be packaged as a deployable module.&lt;/p&gt;
+
+&lt;p&gt;One important advantage of the compilation based approach is the reuse of infrastructure. We are able to effortlessly (relative to &lt;a href=&quot;https://arxiv.org/abs/1901.05350&quot;&gt;other approaches&lt;/a&gt;) target the web by reusing the infrastructure for optimizing GPU kernels for native platforms such as CUDA, Metal and OpenCL. If the mapping of the WebGPU API to native APIs is efficient we can expect similar performance with very little work. More importantly, the  [...]
+
+&lt;h2 id=&quot;building-a-wasm-and-webgpu-compiler&quot;&gt;Building a WASM and WebGPU Compiler&lt;/h2&gt;
+
+&lt;p&gt;In order to build a compiler that can target WASM and WebGPU, we need the following elements:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A SPIR-V generator for compute shaders.&lt;/li&gt;
+  &lt;li&gt;A WASM generator for the host program.&lt;/li&gt;
+  &lt;li&gt;A runtime to load and execute the generated program.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Luckily, TVM already has a SPIR-V target for Vulkan, and uses LLVM for host code generation. So we can just repurpose the two to generate the device and host programs.&lt;/p&gt;
+
+&lt;p&gt;The main challenge is the runtime. We need a runtime to load the shader code, and to enable  the host code talk to communicate with the shader correctly. TVM has a minimum C++ based runtime. We build a minimum web runtime library and link it with the generated shader and host driving code, producing a single WASM file. However, this WASM module still contains two unknown dependencies:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The runtime needs to call into system library calls (malloc, stderr).&lt;/li&gt;
+  &lt;li&gt;The wasm runtime needs to interact with the WebGPU driver (in javascript where the WebGPU API is the first-class citizen).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;WASI is a standard solution to solve the first problem. While there is not yet a mature WASI on the web, we can use emscripten to generate a WASI-like library (see discussion &lt;a href=&quot;https://github.com/emscripten-core/emscripten/issues/11075&quot;&gt;here&lt;/a&gt;) to provide these system libraries.&lt;/p&gt;
+
+&lt;p&gt;We solve the second problem by building a WebGPU runtime inside TVM’s JS runtime, and calling back to these functions from the WASM module when invoking GPU code. Using the &lt;a href=&quot;https://tvm.apache.org/docs/dev/runtime.html#packedfunc&quot;&gt;PackedFunc&lt;/a&gt; mechanism in TVM’s runtime system, we can directly expose high-level runtime primitives by passing JavaScript closures to the WASM interface. This approach keeps most of the runtime code in JavaScript, we co [...]
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/tvm-wasm-stack.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;
+
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/webgpu/webgpu-mobilenet-perf.png&quot; alt=&quot;image&quot; width=&quot;65%&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;We ran a quick experiment comparing the execution of a full computational graph via TVM’s WebGPU backend and native targets that use native GPU runtimes (Metal and OpenCL). On the MobileNet model, we can find that the WebGPU can get close to matching the performance of Metal. Assuming Chrome WebGPU’s runtime targets Metal instead of OpenCL on the MacOS, we can safely assume there is little to no performance loss when targeting the GPU.&lt;/p&gt;
+
+&lt;p&gt;This benchmark excludes the CPU to GPU data copy cost and only benchmarks the GPU execution. Currently the data copy from CPU to GPU can still take 25% of the execution time; however, these costs can further be amortized via approaches like double buffering in a continuous execution setting.&lt;/p&gt;
+
+&lt;p&gt;Our reported end-to-end running time of mobilenet is by no means optimal, since we simply reused a tuned programs from GTX 1080 Ti, which is very different from the Intel graphics GPU. We expect further performance boost by using &lt;a href=&quot;https://tvm.apache.org/2018/10/03/auto-opt-all&quot;&gt;AutoTVM&lt;/a&gt; on the target platform of interest.&lt;/p&gt;
+
+&lt;h2 id=&quot;looking-to-the-future&quot;&gt;Looking to the Future&lt;/h2&gt;
+
+&lt;p&gt;Our results suggest many interesting opportunities for machine learning on the web. Notably, WebGPU is an API that is still evolving and its implications could go beyond web applications. For example one could target native APIs of WebGPU as it matures and becomes standardized through WASI, enabling standalone WASM applications that make use of WebGPU.&lt;/p&gt;
+
+&lt;p&gt;The TVM community is also actively working on a &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/master/rust&quot;&gt;Rust based runtime&lt;/a&gt; that would enable much more robust WASM support and enable easier interaction with projects like &lt;a href=&quot;https://github.com/gfx-rs/wgpu-rs&quot;&gt;wgpu&lt;/a&gt;, and the &lt;a href=&quot;https://rustwasm.github.io/docs/book/&quot;&gt;Rust WASM&lt;/a&gt; ecosystem. As an open source project, we are looking for c [...]
+
+&lt;p&gt;The proposed approach provides effective machine learning support for most WASM’s application scenarios. The close to native performance could unlock better &lt;a href=&quot;https://en.wikipedia.org/wiki/Federated_learning&quot;&gt;federated learning&lt;/a&gt; capabilities on the browser. The same compiled package should also be able to run on native WASM executors to provide sandbox for the applications.&lt;/p&gt;
+
+&lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the Code&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/tqchen/tvm-webgpu-example&quot;&gt;Example project for image classification&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/master/web&quot;&gt;Apache TVM on github&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;acknowledgement&quot;&gt;Acknowledgement&lt;/h2&gt;
+
+&lt;p&gt;We would like to thank the emscripten project for providing the WASM compilation infrastructures as well as the JS library support on the web. We would also like to thank the WebGPU community for various helpful discussions. Thanks to Fletcher Haynes for valuable feedbacks to the post.&lt;/p&gt;
+</description>
+                <link>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</link>
+                <guid>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</guid>
+                <pubDate>Thu, 14 May 2020 00:00:00 -0700</pubDate>
+        </item>
+
+        <item>
                 <title>Integrating TVM into PyTorch</title>
                 <description>&lt;p&gt;As TVM continuously demonstrates improvements to the efficiency of deep learning execution,
 it has become clear that PyTorch stands to benefit from directly leveraging the compiler stack.
@@ -2691,13 +2778,13 @@ We are starting to look at performance optimization and we expect more improveme
 &lt;p&gt;You should see something like this:&lt;/p&gt;
 
 &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-llvm&quot; data-lang=&quot;llvm&quot;&gt;&lt;span class=&quot;c1&quot;&gt;; ModuleID = 'myadd__kernel0'&lt;/span&gt;
-&lt;span class=&quot;err&quot;&gt;source_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
+&lt;span class=&quot;err&quot;&gt;sour&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;datalayout&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;triple&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;amdgcn-amd-amdhsa-hcc&quot;&lt;/span&gt;
 
 
 &lt;span class=&quot;c1&quot;&gt;; Function Attrs: nounwind&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;amdgpu_kernel&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class [...]
+&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;amdgpu_ker&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ne&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;l&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k [...]
 &lt;span class=&quot;nl&quot;&gt;entry:&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workgroup.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workitem.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
@@ -2717,14 +2804,14 @@ We are starting to look at performance optimization and we expect more improveme
   &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%12&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
+  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt; [...]
+  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt; [...]
   &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt;
   &lt;span class=&quot;nv&quot;&gt;%18&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%19&lt;/span [...]
+  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt [...]
+  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;rspa&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; [...]
   &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;
 
 
diff --git a/sitemap.txt b/sitemap.txt
index 11c6b6a..a15a7e1 100644
--- a/sitemap.txt
+++ b/sitemap.txt
@@ -12,6 +12,7 @@ https://tvm.apache.org/sitemap.txt
 https://tvm.apache.org/tags
 https://tvm.apache.org/vta
 
+https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu
 https://tvm.apache.org/2019/05/30/pytorch-frontend
 https://tvm.apache.org/2019/04/29/opt-cuda-quantized
 https://tvm.apache.org/2019/03/18/tvm-apache-announcement


Mime
View raw message