A 16Kb Antifuse One-Time-Programmable Memory in 5nm High-K Metal-Gate FinFET CMOS Featuring Bootstrap High Voltage Scheme, Read Endpoint Detection and Pseudo-Differential Sensing
A 16Kb one-time-programmable (OTP) antifuse memory is fabricated in a 5nm high-K, metal-gate FinFET CMOS for the first time. The bootstrap high voltage scheme (BHVS), read endpoint detection (REPD) and pseudo-differential sensing (PDS) are implemented to achieve intrinsic bit error rate (BER) below 1ppb for in-field programming in 5nm SoC and 10 years of data retention at 125°C.CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference
CHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.Characterization of Fatigue and Its Recovery Behavior in Ferroelectric HfZrO
In this study, polarization fatigue of HfZrO ferroelectric is investigated with SILC (stress-induced-leakage-current) measurement under different E-field stresses. Under high-field, we observed strong correlation between polarization wake-up and SILC increase. This is attributed to oxygen vacancy redistribution and percolation path formation, especially at high frequency cycling. However, polarization fatigue at low field is found to occur without SILC increase. P-E loop measurements revealed that charge trapping is the main contributor under the low-bias. We demonstrated that the fatigue caused by low-field stress could be effectively recovered through an interspersed periodical, short-term cycles at high-field to manage charge trapping and oxygen vacancy redistribution, thus resulting in prolonged endurance to >1E12 cycles without SILC degradation at room temperature. We also validated that a negligible fatigue switching in HfZrO can be achieved at -40°C as low-temperature operation further reduces charge trapping.Cold MRAM as a Density Booster for Embedded NVM in Advanced Technology
Considering the improved performance of MTJs and access transistors for MRAM at low temperatures, we proposed a novel design for embedded Cold MRAM to boost the cell density to 5.3x of a conventional 6T-SRAM. Together with the CMOS operated at cryogenic conditions, they can provide a potential solution for the high demanding HPC applications.Low-Voltage (~1.3V), Arsenic Free Threshold Type Selector with Ultra High Endurance (> 1011) for High Density 1S1R Memory Array
Low voltage selectors are critical for low power operation of high density non-volatile memories. In this work, selectors based on arsenic free chalcogenide materials are demonstrated with record high endurance over 1011 cycles together with threshold voltage ~1.3V and leakage current ~5nA. The enhanced endurance is attributed to suppression of phase separation with more stable amorphous network by proper dopants.MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 105X and Reduce Accuracy Degradation by 10.8X
We present three novel MLC PCM techniques - (1) device requirement balancing, (2) prediction-based MSB-biased referencing, and (3) bit-prioritized placement to address the MLC device challenges in neural network applications. Using measured MLC bit error rates, the proposed techniques can improve the MLC PCM retention time by 105 times while keeping the ResNet-20 inference accuracy degradation within 3% and reduce the accuracy degradation by 91% (10.8X) for CIFAR-100 dataset in the presence of temporal resistance drift.Reliability and Magnetic Immunity of Reflow-Capable Embedded STT-MRAM in 16nm FinFET CMOS Process
We demonstrate the reliability and magnetic immunity of STT-MRAM embedded in 16nm FinFET CMOS process. The technology supports endurance cycles up to 105 for wide temperature range from -40°C to 125°C with low bit error rate and passes 106 cycles at the worst temperature case of -40°C. Data retention sustains three solder-reflow cycles and up to 10 years with less than 1ppm error rate at 234°C. Read disturb error rate is less than 10-20 per read. Magnetic immunity of standby and active mode can reach 550Oe for 10 years 1ppm error rate and 800Oe for 0.1ppm error rate per write at 125 °C, respectively.Materials Requirements of High-Speed and Low-Power Spin-Orbit-Torque Magnetic Random-Access Memory
As spin-orbit-torque magnetic random-access memory (SOT-MRAM) is gathering great interest as the next-generation low-power and high-speed on-chip cache memory applications, it is critical to analyze the magnetic tunnel junction (MTJ) properties needed to achieve sub-ns, and fJ write operation when integrated with CMOS access transistors. In this paper, a 2T-1MTJ cell-level modeling framework for in-plane type Y SOT-MRAM suggests that high spin Hall conductivity and moderate SOT material sheet resistance are preferred. We benchmark write energy and speed performances of type Y SOT cells based on various SOT materials experimentally reported in the literature, including heavy metals, topological insulators and semimetals. We then carry out detailed benchmarking of SOT material Pt, β-W, and BixSe(1-x) with different thickness and resistivity. We further discuss how our 2T-1MTJ model can be expanded to analyze other variations of SOT-MRAM, including perpendicular (type Z) and type X SOT-MRAM, two-terminal SOT-MRAM, as well as spin-transfer-torque (STT) and voltage-controlled magnetic anisotropy (VCMA)-assisted SOT-MRAM. This work will provide essential guidelines for SOT-MRAM materials, devices, and circuits research in the future.Interfacial engineering of SOT-MRAM to modulate atomic diffusion and enable PMA stability >400 ◦C
We report our work on the optimization of W/CoFeB/MgO structures to fulfill perpendicular magnetic anisotropy (PMA) requirements in the production of SOT-MRAM. By optimizing the natural oxidization process of deposited Mg layer and introducing different dust layers at W/CoFeB and CoFeB/MgO interfaces, PMA of W/CoFeB/MgO structures can be enhanced by about 100%, which is much higher than that in Ta-based structures. The origin of this PMA enhancement was further confirmed by transmission electron microscopy investigations. The corresponding SOT switching efficiency and current-induced effective fields were also investigated.A 40nm 2Mb ReRAM Macro with 85% Reduction in FORMING Time and 99% Reduction in Page-Write Time Using Auto-FORMING and Auto-Write Schemes
This work proposes (1) an auto-forming (AF) scheme to shorten the macro forming time (TFM-M) and testing costs; (2) an auto-RESET (ARST) scheme to shorten page-RESET time (TW-PAGE-RST) for expanding the applications of hidden-RESET operation in standby mode, and (3) an auto-SET (ASET) scheme to shorten page-write time (TW-PAGE) combined with hidden-RESET scheme. A fabricated 40nm 2Mb ReRAM macro achieved 85+% reduction in T FM - M , and 99+% reduction in TW-PAGE for a page. For the first time, AF, ARST, and ASET schemes are demonstrated in silicon for ReRAM.
Memory
Memory
Data is the most valuable resource in today’s digital economy. Currently over 2.5 quintillion (1018) bytes of data are generated daily and the pace is accelerating. More data than ever needs to be processed. Memory plays a key role in the flow of data. The gap between logic and memory is a bottle neck to system performance. To optimize the trade-off between cost and performance, a hierarchical memory system has been adopted. At the top of the hierarchy are static random access memories (SRAM) and dynamic random access memory (DRAM), both inherently volatile. SRAM is integrated right on the logic chips as cache memory to provide fastest access. DRAM is physically smaller than SRAM and consequently supports higher capacity. DRAM is generally an off-chip memory solution and ~10x slower than SRAM due to the need for constant refresh. Non-volatile memories (NVM) such as Flash are next in the hierarchy providing much higher memory capacity and density while also preserving information in the absence of power.
Recent new technologies are emerging rapidly to bring processing tasks near to or inside the memory to improve computing efficiency and enable new functionalities. Emerging NVMs use new types of materials and mechanisms to store data. They are promising for blending the memory hierarchy to boost the overall performance. Furthermore, their unique characteristics offer great potential to enable new applications (e.g. neuromorphic computing) and novel architectures (e.g. 3D integration).
TSMC’s non-volatile memory solutions include Flash, Spin-transfer torque magnetic random access memory (STT-MRAM), and resistive random access memory (RRAM). TSMC is also actively exploring phase change random access memory (PCRAM), and spin-orbit torque MRAM (SOT-MRAM) elements, as well as selector devices which are essential to support higher density cross-point array architectures.