tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
Date Thu, 30 Apr 2020 03:55:36 GMT

tmoreau89 commented on pull request #8:
URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621597493


   Thank you for the explanation. Do you mind breaking down what happens in the 32x32 configuration
that causes correctness to fail (since the dependence analysis waive is unbounded)?
   
   I agree that there should be a constant set for hardware and software to agree on, but
I'm not convinced that it should be set by a user, and instead be derived automatically by
the hardware target we choose in pkg_config. This is to ensure that we can achieve an iteration
interval of `II=1` for the GEMM pipeline, which is constrained by the SRAM write to read latency,
and then enforced by the software.
   
   The example I can use is that if we are say we want to accumulate SRAM at address A and
B for 3 consecutive cycles. We could do it (1) by updating each address consecutively, or
(2) by interlacing the updates.
   
   ```
   // initial contents of SRAM at address A is 20, and B is 14
   // here we assume 2 cycles to propagate a write to being able to read it
   t=0: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14
   t=1: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14
   t=2: SRAM[A] += 1 // SRAM[A] is 21, SRAM[B] is 14
   t=3: SRAM[B] += 1 // SRAM[A] is 21, SRAM[B] is 14
   t=4: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 14
   t=5: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 15
   t=6: noop // SRAM[A] is 22, SRAM[B] is 15
   t=7: noop // SRAM[A] is 22, SRAM[B] is 16
   
   // Results are incorrect: A final value is 22 instead of 23, and B final value is 16 instead
of 17
   ```
   
   Now in the case where these back to back writes are detected in the runtime, we would have
an invalid schedule. So assuming the TVM compiler produces a valid schedule, we would end
up with interlaced writes to A and B:
   
   ```
   // initial contents of SRAM at address A is 20, and B is 14
   // here we assume 2 cycles to propagate a write to being able to read it
   t=0: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14
   t=1: SRAM[B] += 1 // SRAM[A] is 20, SRAM[B] is 14
   t=2: SRAM[A] += 1 // SRAM[A] is 21, SRAM[B] is 14
   t=3: SRAM[B] += 1 // SRAM[A] is 21, SRAM[B] is 15
   t=4: SRAM[A] += 1 // SRAM[A] is 22, SRAM[B] is 15
   t=5: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 16
   t=6: noop // SRAM[A] is 23, SRAM[B] is 16
   t=7: noop // SRAM[A] is 23, SRAM[B] is 17
   
   // Results are now correct because the compiler correctly ensured enough wait time was
allowed between consecutive writes and reads to the same address in SRAM.
   ```
   
   So therefore the point I'm hoping to make is that this 2 cycle wait time in enforced by
the SRAM, and should not be exposed to the user to be set as a parameter in VTA_config.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message