CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and InferenceCHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.
Characterization of Fatigue and Its Recovery Behavior in Ferroelectric HfZrOIn this study, polarization fatigue of HfZrO ferroelectric is investigated with SILC (stress-induced-leakage-current) measurement under different E-field stresses. Under high-field, we observed strong correlation between polarization wake-up and SILC increase. This is attributed to oxygen vacancy redistribution and percolation path formation, especially at high frequency cycling. However, polarization fatigue at low field is found to occur without SILC increase. P-E loop measurements revealed that charge trapping is the main contributor under the low-bias. We demonstrated that the fatigue caused by low-field stress could be effectively recovered through an interspersed periodical, short-term cycles at high-field to manage charge trapping and oxygen vacancy redistribution, thus resulting in prolonged endurance to >1E12 cycles without SILC degradation at room temperature. We also validated that a negligible fatigue switching in HfZrO can be achieved at -40°C as low-temperature operation further reduces charge trapping.
Cold MRAM as a Density Booster for Embedded NVM in Advanced TechnologyConsidering the improved performance of MTJs and access transistors for MRAM at low temperatures, we proposed a novel design for embedded Cold MRAM to boost the cell density to 5.3x of a conventional 6T-SRAM. Together with the CMOS operated at cryogenic conditions, they can provide a potential solution for the high demanding HPC applications.
Low-Voltage (~1.3V), Arsenic Free Threshold Type Selector with Ultra High Endurance (> 1011) for High Density 1S1R Memory ArrayLow voltage selectors are critical for low power operation of high density non-volatile memories. In this work, selectors based on arsenic free chalcogenide materials are demonstrated with record high endurance over 1011 cycles together with threshold voltage ~1.3V and leakage current ~5nA. The enhanced endurance is attributed to suppression of phase separation with more stable amorphous network by proper dopants.
MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 105X and Reduce Accuracy Degradation by 10.8XWe present three novel MLC PCM techniques - (1) device requirement balancing, (2) prediction-based MSB-biased referencing, and (3) bit-prioritized placement to address the MLC device challenges in neural network applications. Using measured MLC bit error rates, the proposed techniques can improve the MLC PCM retention time by 105 times while keeping the ResNet-20 inference accuracy degradation within 3% and reduce the accuracy degradation by 91% (10.8X) for CIFAR-100 dataset in the presence of temporal resistance drift.
Reliability and Magnetic Immunity of Reflow-Capable Embedded STT-MRAM in 16nm FinFET CMOS ProcessWe demonstrate the reliability and magnetic immunity of STT-MRAM embedded in 16nm FinFET CMOS process. The technology supports endurance cycles up to 105 for wide temperature range from -40°C to 125°C with low bit error rate and passes 106 cycles at the worst temperature case of -40°C. Data retention sustains three solder-reflow cycles and up to 10 years with less than 1ppm error rate at 234°C. Read disturb error rate is less than 10-20 per read. Magnetic immunity of standby and active mode can reach 550Oe for 10 years 1ppm error rate and 800Oe for 0.1ppm error rate per write at 125 °C, respectively.
Materials Requirements of High-Speed and Low-Power Spin-Orbit-Torque Magnetic Random-Access MemoryAs spin-orbit-torque magnetic random-access memory (SOT-MRAM) is gathering great interest as the next-generation low-power and high-speed on-chip cache memory applications, it is critical to analyze the magnetic tunnel junction (MTJ) properties needed to achieve sub-ns, and fJ write operation when integrated with CMOS access transistors. In this paper, a 2T-1MTJ cell-level modeling framework for in-plane type Y SOT-MRAM suggests that high spin Hall conductivity and moderate SOT material sheet resistance are preferred. We benchmark write energy and speed performances of type Y SOT cells based on various SOT materials experimentally reported in the literature, including heavy metals, topological insulators and semimetals. We then carry out detailed benchmarking of SOT material Pt, β-W, and BixSe(1-x) with different thickness and resistivity. We further discuss how our 2T-1MTJ model can be expanded to analyze other variations of SOT-MRAM, including perpendicular (type Z) and type X SOT-MRAM, two-terminal SOT-MRAM, as well as spin-transfer-torque (STT) and voltage-controlled magnetic anisotropy (VCMA)-assisted SOT-MRAM. This work will provide essential guidelines for SOT-MRAM materials, devices, and circuits research in the future.
Interfacial engineering of SOT-MRAM to modulate atomic diffusion and enable PMA stability >400 ◦CWe report our work on the optimization of W/CoFeB/MgO structures to fulfill perpendicular magnetic anisotropy (PMA) requirements in the production of SOT-MRAM. By optimizing the natural oxidization process of deposited Mg layer and introducing different dust layers at W/CoFeB and CoFeB/MgO interfaces, PMA of W/CoFeB/MgO structures can be enhanced by about 100%, which is much higher than that in Ta-based structures. The origin of this PMA enhancement was further confirmed by transmission electron microscopy investigations. The corresponding SOT switching efficiency and current-induced effective fields were also investigated.
A 40nm 2Mb ReRAM Macro with 85% Reduction in FORMING Time and 99% Reduction in Page-Write Time Using Auto-FORMING and Auto-Write SchemesThis work proposes (1) an auto-forming (AF) scheme to shorten the macro forming time (TFM-M) and testing costs; (2) an auto-RESET (ARST) scheme to shorten page-RESET time (TW-PAGE-RST) for expanding the applications of hidden-RESET operation in standby mode, and (3) an auto-SET (ASET) scheme to shorten page-write time (TW-PAGE) combined with hidden-RESET scheme. A fabricated 40nm 2Mb ReRAM macro achieved 85+% reduction in T FM - M , and 99+% reduction in TW-PAGE for a page. For the first time, AF, ARST, and ASET schemes are demonstrated in silicon for ReRAM.
22nm STT-MRAM for Reflow and Automotive Uses with High Yield, Reliability, and Magnetic Immunity and with Performance and Shielding OptionsWe demonstrate high yield results from a solder-reflow-capable spin-transfer-torque MRAM embedded in 22nm ultra-low leakage (ULL) CMOS technology. The technology supports -40 to 150°C operation and data retention though six solder reflow cycles and far exceeding 10 years at 150°C. Ten year native magnetic field immunity is >1100 Oe at 25°C at the 1ppm bit upset level. A shield-in-package solution demonstrates <; 1ppm bit upset rates from a disc magnet providing 3.5 kOe disturb field exposure for ~80 hours at 25°C. Trading off reflow capability, using smaller CD magnetic tunnel junctions, higher performance is achieved, for example read signal development times of 6ns at 125°C and average write pulse times slightly over 30ns at -40°C in a 20Mb design.
Data is the most valuable resource in today’s digital economy. Currently over 2.5 quintillion (1018) bytes of data are generated daily and the pace is accelerating. More data than ever needs to be processed. Memory plays a key role in the flow of data. The gap between logic and memory is a bottle neck to system performance. To optimize the trade-off between cost and performance, a hierarchical memory system has been adopted. At the top of the hierarchy are static random access memories (SRAM) and dynamic random access memory (DRAM), both inherently volatile. SRAM is integrated right on the logic chips as cache memory to provide fastest access. DRAM is physically smaller than SRAM and consequently supports higher capacity. DRAM is generally an off-chip memory solution and ~10x slower than SRAM due to the need for constant refresh. Non-volatile memories (NVM) such as Flash are next in the hierarchy providing much higher memory capacity and density while also preserving information in the absence of power.
Recent new technologies are emerging rapidly to bring processing tasks near to or inside the memory to improve computing efficiency and enable new functionalities. Emerging NVMs use new types of materials and mechanisms to store data. They are promising for blending the memory hierarchy to boost the overall performance. Furthermore, their unique characteristics offer great potential to enable new applications (e.g. neuromorphic computing) and novel architectures (e.g. 3D integration).
TSMC’s non-volatile memory solutions include Flash, Spin-transfer torque magnetic random access memory (STT-MRAM), and resistive random access memory (RRAM). TSMC is also actively exploring phase change random access memory (PCRAM), and spin-orbit torque MRAM (SOT-MRAM) elements, as well as selector devices which are essential to support higher density cross-point array architectures.