Home/ Research/ Research Areas/ Artificial Intelligence

Artificial Intelligence

Since the last decade, we have been witnessing a steep rise of Artificial Intelligence (AI) as an alternative computing paradigm. Although the idea has been around since 1950s, AI needed progress in algorithms, capable hardware, and sufficiently large training data to become a practical and powerful tool. Progress in computing hardware has been a key ingredient for the AI renaissance and will remain increasingly critical to realize future AI applications.

We are particularly well-positioned to supply the most advanced AI hardware to our customers thanks to our leading-edge logic, memory, and packaging technologies. We have established a research pipeline for technology to enable leading-edge AI devices, circuits, and systems for decades to come. Near- and in-memory computing, embedded non-volatile memory technologies, 3D integration, and error-resilient computing are amongst our specific AI hardware research areas. Our in-house research is complemented by strong academic and governmental partnerships, which allow us to interact with and influence leading AI researchers around the world.

Sort by:

CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference
2021
CHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.
MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 10⁵X and Reduce Accuracy Degradation by 10.8X
2021
We present three novel MLC PCM techniques - (1) device requirement balancing, (2) prediction-based MSB-biased referencing, and (3) bit-prioritized placement to address the MLC device challenges in neural network applications. Using measured MLC bit error rates, the proposed techniques can improve the MLC PCM retention time by 10⁵ times while keeping the ResNet-20 inference accuracy degradation within 3% and reduce the accuracy degradation by 91% (10.8X) for CIFAR-100 dataset in the presence of temporal resistance drift.
A 351TOPS/W and 372.4GOPS Computing-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications
2020
Systolic building block for low-latency, energy-efficient logic-on-logic 3D-IC implementation of convolutional neural network
2019
We present a building block architecture for systolic array 3D-IC implementations of convolutional neural network (CNN) inference. The building block can be part of a library offered by a chip design service provider to support efficient CNN implementations. We describe how the building block can form systolic arrays for implementing low-latency, energy-efficient CNN inference for models of any size, while incorporating advanced packaging features such as “logic-on-logic” 3D-IC (micro-bump/TSV, monolithic 3D or other 3D technology). We present delay and power analysis for 2D and 3D implementations, and argue that as systolic arrays scale in size, 3D implementations based on, e.g., micro-bump/TSV, lead to significant performance improvements over 2D implementations.
Memory-Logic Hybrid Gate with 3D-Stackable Complementary Latches for FinFET-based Neural Networks
2019
A memory-logic hybrid gate with complementary resistive switching pairs on vias in BEOL FinFET technologies with an area-efficient, 3D-stackable structures is proposed. Stable output logic stages enabled by the complementary states on the RRAM pair have been demonstrated. Through stacked-vias architectures, logic operations based on multiple non-volatile states are achieved.
A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors
2018
For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation predominates the computational workload for both convolution (CNVL) and fully-connect (FCNL) neural-network (NN) layers. This hinders the adoption of DNN processors to on the edge artificial-intelligence (AI) devices, which require low-power, low-cost and fast inference. Binary DNNs [5-6] are used to reduce computation and hardware costs for AI edge devices; however, a memory bottleneck still remains. In Fig. 31.5.1 conventional PE arrays exploit parallelized computation, but suffer from inefficient single-row SRAM access to weights and intermediate data. Computing-in-memory (CIM) improves efficiency by enabling parallel computing, reducing memory accesses, and suppressing intermediate data. Nonetheless, three critical challenges remain (Fig. 31.5.2), particularly for FCNL. We overcome these problems by co-optimizing the circuits and the system. Recently, researches have been focusing on XNOR based binary-DNN structures [6]. Although they achieve a slightly higher accuracy, than other binary structures, they require a significant hardware cost (i.e. 8T-12T SRAM) to implement a CIM system. To further reduce the hardware cost, by using 6T SRAM to implement a CIM system, we employ binary DNN with 0/1-neuron and ±1-weight that was proposed in [7]. We implemented a 65nm 4Kb algorithm-dependent CIM-SRAM unit-macro and in-house binary DNN structure (focusing on FCNL with a simplified PE array), for cost-aware DNN AI edge processors. This resulted in the first binary-based CIM-SRAM macro with the fastest (2.3ns) PS operation, and the highest energy-efficiency (55.8TOPS/W) among reported CIM macros [3-4].

1 2 3

Artificial Intelligence

Logic

Interconnect

Memory

Artificial Intelligence

Artificial Intelligence

CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference

MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 105X and Reduce Accuracy Degradation by 10.8X

A 351TOPS/W and 372.4GOPS Computing-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications

Systolic building block for low-latency, energy-efficient logic-on-logic 3D-IC implementation of convolutional neural network

Memory-Logic Hybrid Gate with 3D-Stackable Complementary Latches for FinFET-based Neural Networks

A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors

MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 10⁵X and Reduce Accuracy Degradation by 10.8X