CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference
CHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.A 40nm 2Mb ReRAM Macro with 85% Reduction in FORMING Time and 99% Reduction in Page-Write Time Using Auto-FORMING and Auto-Write Schemes
This work proposes (1) an auto-forming (AF) scheme to shorten the macro forming time (TFM-M) and testing costs; (2) an auto-RESET (ARST) scheme to shorten page-RESET time (TW-PAGE-RST) for expanding the applications of hidden-RESET operation in standby mode, and (3) an auto-SET (ASET) scheme to shorten page-write time (TW-PAGE) combined with hidden-RESET scheme. A fabricated 40nm 2Mb ReRAM macro achieved 85+% reduction in T FM - M , and 99+% reduction in TW-PAGE for a page. For the first time, AF, ARST, and ASET schemes are demonstrated in silicon for ReRAM.
Memory
RRAM
The combination of AI (Artificial Intelligence) and IoT (the Internet of Things) referred as AIoT is a powerful duo that may fuel the growth of the semiconductor industry for years to come. High-capacity on-chip memories with low power consumption are required for energy-efficient machine learning. It can support both 1T1R (1 transistor + 1RRAM) and 1S1R (1 selector + 1RRAM) array architectures. Compared to the conventional 1T1R architecture, the 1S1R architecture can achieve higher density and enable 3D integration. TSMC, in collaboration with a technology partner, has developed RRAM memory technology on a 40nm CMOS logic backbone to support application-specific needs. TSMC continues to explore novel RRAM material stacks and their density-driven integration, along with variability-aware circuit design and programing constructs to realize high-density embedded RRAM-based solution options for AIoT applications.