Home/ Research/ Research Areas/ Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Since the last decade, we have been witnessing a steep rise of Artificial Intelligence (AI) as an alternative computing paradigm. Although the idea has been around since 1950s, AI needed progress in algorithms, capable hardware, and sufficiently large training data to become a practical and powerful tool. Progress in computing hardware has been a key ingredient for the AI renaissance and will remain increasingly critical to realize future AI applications.

We are particularly well-positioned to supply the most advanced AI hardware to our customers thanks to our leading-edge logic, memory, and packaging technologies. We have established a research pipeline for technology to enable leading-edge AI devices, circuits, and systems for decades to come. Near- and in-memory computing, embedded non-volatile memory technologies, 3D integration, and error-resilient computing are amongst our specific AI hardware research areas. Our in-house research is complemented by strong academic and governmental partnerships, which allow us to interact with and influence leading AI researchers around the world.

Sort by:
11-20 of 24
  • A 2.38 MCells/mm2 9.81 -350 TOPS/W RRAM Compute-in-Memory Macro in 40nm CMOS with Hybrid Offset/IOFF Cancellation and ICELL RBLSL Drop Mitigation

    2023
    A dense compute-in-memory (CIM) macro using resistive random-access memory (RRAM) showing solutions to read channel mismatch, high I OFF , ADC offset, IR drop, and cell resistance variation is presented. By combining a hybrid analog/mixed-signal offset cancellation scheme and ICELLRBLSL drop mitigation with a low cell bias target voltage, the proposed macro demonstrates robust operation (post-ECC bit error rate (BER )<5×10−8 for 8WL CIM) while maintaining an effective cell density 1.03 – 33.1× higher than prior art and achieving 1.74 – 13.35× improved average MAC efficiency relative to the previous highest-density RRAM CIM macro.
  • A 28nm Nonvolatile AI Edge Processor using 4Mb Analog-Based Near-Memory-Compute ReRAM with 27.2 TOPS/W for Tiny AI Edge Devices

    2023
    Tiny AI edge processors prefer using nvCIM to achieve low standby power, high energy efficiency (EF), and short wakeupto-response latency (T WR ). Most nvCIMs use in-memory computing for MAC operations; however, this imposes a tradeoff between EF and accuracy, due to MAC accumulationnumber (N ACU ) versus signal margin and readout quantization. To achieve high EF and high accuracy, we developed a systemlevel nvCIM-friendly control scheme and a nvCIM macro with two analog near-memory computing schemes. The proposed 28nm nonvolatile AI edge processor with 4Mb ReRAMnvCIM achieved high EF (27.2 TOPS/W), short T WR (3.19 ms), and low accuracy loss (<0.5%) The EF of the ReRAM-nvCIM macro was 38.6 TOPS/W.
  • A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro with a 192GB/s Read-and-Decryption Bandwidth and 25.1-55.1TOPS/W 8b MAC for AI Operations

    2022
    Nonvolatile computing-in-memory (nvCIM) [1]–[4] is ideal for battery-powered tiny artificial intelligence (AI) edge devices that require nonvolatile data storage and low system-level power consumption. Data encryption/decryption (data-ED) is also required to prevent access to the neural network (NN) model weights and the personalized data used to improve inference accuracy. This paper presents an AI nvCIM data-ED-capable macro with high energy efficiency (EF MAC ), a low macro-level read latency (t AC-M ), a high read bandwidth (R-BW), and high-precision inputs (IN), weights (W), and outputs (OUT) for multiply-and-accumulate (MAC) operations. Prior nvCIM macros designed for MAC operations [1]–[3] do not support data-ED or a high number of accumulations (ACU). The use of a single NN layer also requires multiple cycles for full-channel MAC (MAC FC-L operations. A low computing latency (t AC-FC-L ) and high-precision nvCIM macro with data-ED design faces the following challenges: (1) long t AC-FC-L and low EF MAC for MAC FC-L operations, which requires multiple memory accesses with a limited R-BW; (2) long t AC-M due to BL pre-charge (t PRE ), signal development (t SD ), sensing (t SA ), and data-D (t OE ); (3) High power consumption for BL precharge, particularly when using a high BL read voltage (V RD ) to increase sensing yield.
  • A 40-nm, 2M-Cell, 8b-Precision, Hybrid SLC-MLC PCM Computing-in-Memory Macro with 20.5 - 65.0TOPS/W for Tiny-Al Edge Devices

    2022
    Efficient edge computing, with sufficiently large on-chip memory capacity, is essential in the internet-of-everything era. Nonvolatile computing-in-memory (nvCIM) reduces the data transfer overhead by bringing computation closer, in proximity, to the memory [1]–[4]. While the multi-level cell (MLC) has higher storage density than the single-level cell (SLC). A few MLC or analog nvCIM designs had been proposed, but they either target simpler neural-net models [5] or are implemented using a less area-efficient differential cell [6]. Furthermore, representing the entire weight vector using one storage type does not exploit the drastic accuracy difference between the upper and the lower bits.
  • A 40nm 60.64TOPS/W ECC-Capable Compute-in-Memory/Digital 2.25MB/768KB RRAM/SRAM System with Embedded Cortex M3 Microprocessor for Edge Recommendation Systems

    2022
    Resistive RAM (RRAM) is an exciting technology that exhibits various new properties that have been long absent in traditional charge-based memories. RRAM features high-bit density, non-volatile storage, accurate compute in-memory (CIM), and both process and voltage compatibility. Each of these properties makes RRAM a compelling candidate for Al applications, particularly at the edge. To demonstrate the utility of these properties, we direct our effort to real-world event-driven and memory-constrained applications, such as recommendation systems and natural language processing (NLP). To enable these applications at the edge, higher memory capacity and bandwidth must be achieved despite irregular data access patterns that prevent effective caching and data reuse. Furthermore, we find that these applications are rarely (if ever) run continuously, but instead execution is triggered by events. The combination of these two challenges makes RRAM an ideal candidate given its high density and non-volatility enabling near-zero leakage power and complete power down. To address these challenges, this paper presents a 2.25MB RRAM based CIM accelerator with 765kB of SRAM and an embedded Cortex M3 processor for edge devices.
  • A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2RRAM Binary/Compute-in-Memory Macro with 4.23x Improvement in Density and >75% Use of Sensing Dynamic Range

    2022
    Compute-in-Memory (CIM) using emerging nonvolatile (eNVM) memory technologies, such as resistive random-access memory (RRAM), has been shown by several implemented macros to be an energy-efficient alternative to traditional von Neumann architectures [1]–[6]. Since moving data on- and off-chip has a high energy cost, area efficiency is important to the practical utility of CIM with RRAM. Many systems demonstrated so far have not reported area efficiency or addressed the challenges CIM with RRAM presents with respect to practical area-constrained integrated circuits.
  • An 8-Mb DC-Current-Free Binary-to-8b Precision ReRAM Nonvolatile Computing-in-Memory Macro using Time-Space-Readout with 1286.4-21.6TOPS/W for Edge-AI Devices

    2022
    Battery-powered edge-AI devices require nonvolatile computing-in-memory (nvCIM) macros for nonvolatile data storage and multiply-and-accumulate (MAC) operations. High inference accuracy requires MAC operations with high input (IN), weight (W), and output (OUT) precisions. A high energy efficiency (EFMAC) and a short computing latency (tAC) are also required. Most existing silicon-verified nvCIM macros use current-mode signal generation; using current [1]–[3] or hybrid current-voltage readout schemes [4]–[5] for multibit MAC operations to compensate for the small BL -voltage swing and signal margin resulting from the low read-disturb-free voltage (VRD) .
  • SoIS- An Ultra Large Size Integrated Substrate Technology Platform for HPC Applications

    2021
    Along with HPC electrical performance evolution, larger size and more layers ABF substrate play one of key roles to be the succeeded, however as it transpired recently ABF substrate become the major bottleneck by yield or transmission loss control to cause computing component shortage. An Innovative SoIS (System on Integrated Substrate) technology is proposed to satisfy higher performance applications cost effectively. SoIS technology leverages wafer process and new materials. This innovative integrated substrate presented significantly higher yield than conventional substrate solutions on the TVs with 91x91mm2 substrate size. The electrical TV showed that the insertion loss is 25% lower than that of the most updated GL102 organic substrate at 28GHz for 112Gbps SerDes application. The mechanical/electrical TV has passed package-level reliability tests including MSL4+ (TCG2000, uHAST360) and HTS1500. Microstructure sanity check after reliability torture tests was also proven to pass quality & reliability criteria. Furthermore, by leveraging wafer fab process, SoIS also could provide powerful yet flexible combinations in interconnect and dielectric layer with more aggressive design rule than conventional organic substrate did. Especially, for high bandwidth routing density applications, SoIS can enhance 2~5 times rout-ability than conventional organic substrates to save not just layer counts but also keep the same impedance matching performance without adding extra cost, which have been proven by simulation and Si data successfully.
  • CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference

    2021
    CHIMERA is the first non-volatile deep neural network (DNN) chip for edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown (33 μs). We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.
  • MLC PCM Techniques to Improve Nerual Network Inference Retention Time by 105X and Reduce Accuracy Degradation by 10.8X

    2021
    We present three novel MLC PCM techniques - (1) device requirement balancing, (2) prediction-based MSB-biased referencing, and (3) bit-prioritized placement to address the MLC device challenges in neural network applications. Using measured MLC bit error rates, the proposed techniques can improve the MLC PCM retention time by 105 times while keeping the ResNet-20 inference accuracy degradation within 3% and reduce the accuracy degradation by 91% (10.8X) for CIFAR-100 dataset in the presence of temporal resistance drift.
11-20 of 24