Memory

Data is the most valuable resource in today’s digital economy. Currently over 2.5 quintillion (1018) bytes of data are generated daily and the pace is accelerating. More data than ever needs to be processed. Memory plays a key role in the flow of data. The gap between logic and memory is a bottle neck to system performance. To optimize the trade-off between cost and performance, a hierarchical memory system has been adopted. At the top of the hierarchy are static random access memories (SRAM) and dynamic random access memory (DRAM), both inherently volatile. SRAM is integrated right on the logic chips as cache memory to provide fastest access. DRAM is physically smaller than SRAM and consequently supports higher capacity. DRAM is generally an off-chip memory solution and ~10x slower than SRAM due to the need for constant refresh. Non-volatile memories (NVM) such as Flash are next in the hierarchy providing much higher memory capacity and density while also preserving information in the absence of power.

Recent new technologies are emerging rapidly to bring processing tasks near to or inside the memory to improve computing efficiency and enable new functionalities. Emerging NVMs use new types of materials and mechanisms to store data. They are promising for blending the memory hierarchy to boost the overall performance. Furthermore, their unique characteristics offer great potential to enable new applications (e.g. neuromorphic computing) and novel architectures (e.g. 3D integration).

TSMC’s non-volatile memory solutions include Flash, Spin-transfer torque magnetic random access memory (STT-MRAM), and resistive random access memory (RRAM). TSMC is also actively exploring phase change random access memory (PCRAM), and spin-orbit torque MRAM (SOT-MRAM) elements, as well as selector devices which are essential to support higher density cross-point array architectures.

  • CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference

    2021
    CHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA. authors: M. Giordano, K. Prabhu, K. Koul, R. M. Radway, A. Gural, R. Doshi, Z. F. Khan, J. W. Kustin, T. Liu, G. B. Lopes, V. Turbiner, W.-S. Khwa, Y.-D. Chih, M.-F. Chang, G. Lallement, B. Murmann, S. Mitra, P. Raina