TSMC @ Conferences
A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices
C-X. Xue, J-M. Hung, H-Y. Kao, Y-H. Huang, S-P. Huang, F-C. Chang, P. Chen, T-W. Liu, C-J. Jhang, C-I. Su, W-S. Khwa, C-C. Lo, R-S. Liu, C-C. Hsieh, K-T. Tang, Y-D. Chih, T-Y. J. Chang, M-F. Chang
This paper proposed a ReRAM-CIM macro using Asymmetric Group-Modulated Input (AGMI) scheme, Weighted Current-to-Voltage Signal Stacking (WCVSS) converter, and Hybrid-Precision Voltage-Mode Readout scheme to reduce the power consumption and computing latency, but still maintaining sufficient signal margin for multi-bit MAC computation. The macro embedded with the specification of 4 accumulations of 8b-inputs and 8b-weights with 14-bit output. This macro achieves a 14.8ns computing latency and a 11.91TOPS/W energy efficiency for 8b-MAC operations.
A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b of Precision for AI Edge Chips
J-W. Su, Y-C. Chou, R. Liu, T-W. Liu, P-J. Lu, P-C. Wu, Y-L. Chung, L-Y. Hung, J-S. Ren, T. Pan, S-H. Li, S-C. Chang, S-S. Sheu, W-C. Lo, C-I. Wu, X. Si, C-C. Lo, R-S. Liu, C-C. Hsieh, K-T. Tang, M-F. Chang
This paper presents a SRAM-CIM structure using segmented-BL charge-sharing (SBCS), source-injection local multiplication cell (SILMC), and a prioritized-hybrid-ADC (Ph-ADC) to achieve a small area, low power consumption, as well as against process variation for analog multi-bit MAC computation readout. A 28nm 384kb SRAM-CIM macro was fabricated using a foundry compact-6T cell with support for MAC operations with 16 accumulations of 8b-inputs and 8b-weights with near-full precision output (20b). This macro achieves a 7.2ns tAC and a 22.75TOPS/W energy efficiency for 8b-MAC operations with an FoM (IN-precision × W-precision × output-ratio × output-channel × EF/tAC) 6× higher than prior work.
Y. D. Chih
An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
Y-D. Chih, P-H. Lee, H. Fujiwara, Y-C. Shih, C-F. Lee, R. Naous, Y-L. Chen, C-P. Lo, C-H. Lu, H. Mori, W-C. Zhao, D. Sun, M. E. Sinangil, Y-H. Chen, T-L. Chou, K. Akarvardar, H-J. Liao, Y. Wang, M-F. Chang, T-Y. J. Chang
A 89 TOPS/W and 16.3 TOPS/mm2 all digital SRAM-based CIM macro with full precision has been demonstrated on 22nm logic process. The modular approach with programmable bit width of input activations (1~8bit) and weight (4/8/12/16 bits), either unsigned or 2’s complement signed, can support versatile neural networks. Compared with the listed references, the proposed digital CIM can achieve the high FOM (TOPS/W x TOPS/mm2) of 1450 without accuracy loss in MAC operation.
A 5nm 5.7GHz@1.0V and 1.3GHz@0.5V 4kb Standard-Cell-Based Two-Port Register File with a 16T Bitcell with No Half-Selection Issue
H. Fujiwara, Y-H. Nien, C-Y. Lin, H-Y. Pan, H-W. Hsu, S-R. Wu, Y-Y. Liu, Y-H. Chen, H-J. Liao, J. Chang
We demonstrate a 5nm standard-cell based 2-port register file with 16T SRAM cell targeted for small capacity SRAM applications with improved integration to memory peripheral circuits. A 4Kb array with implementations operates 5.7GHz at 1.0V and 1.3GHz at 0.5V. Si. measurement results also show minimum operating voltage becomes 0.35V.
Kenny Cheng-Hsiang Hsieh
Design of Communication Circuits for Side-by-Side and Stacked Chiplets
As CMOS scales deeply, conventional single-die package systems face increasing challenges in performance and cost. As a result, System-in-Package solutions, where multiple chiplets are integrated on various substrates or on tops of each other, quickly become mainstream for applications from mobile phones to supercomputers. To take full advantage of the trend, co-optimizing system demand and package technology, as well as new interface circuits bridging between chiplets are all necessary. From a system viewpoint, the interface circuits need transparency to software and minimized power, area, and latency. Here, we will give an overall view of side-by-side and stacked chiplet technologies and outline their potentials and limitations. We will explore circuit techniques for these ultrashort-haul links to overcome challenges in synchronization, interconnect routing, design reuse, and design for testability. We will also discuss a recent design example achieving high bandwidth and high energy efficiency all at once.
A 40nm 64Kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-Feedback-Based Read and In-Situ Write Verification
J-H. Yoon, M. Chang, W-S. Khwa, Y-D. Chih, M-F. Chang, A. Raychowdhury
As memory-centric workloads continue to gain momentum, technology solutions that provide higher on-die memory capacity/bandwidth can provide salability beyond SRAM. Resistive RAM (RRAM) owing to higher bit-density, CMOS process/voltage compatibility, nano-second read and non-volatility has emerged as a promising candidate. In spite of early prototypes, several technology challenges remain, and need to be addressed through circuit-technology co-design. This paper presents a 64Kb RRAM macro supporting a programmable number of row-accesses to enable vector-matrix multiplication for a target algorithm-level inference-accuracy, voltage-based read with active feedback, advancing the state-of-the-art current-based read, targeted for the low ratio between the high-resistance-state and low-resistance-state in typical RRAM, read-disturb tolerance under RRAM drift, through embedded read-disturb monitor and write-back and in-situ write verification to enable a tight resistance distribution.