Invited Talk by Prof. Shi-Yu Huang (National Tsing Hua University and National Institutes on Applied Research, Taiwan)

July 28, 2025

Speed Learning Scheme To Mitigate The Silent Data Corruption in a Multi-Core Design; August 20, 11:30, U38 2.013

GS-IMTR is happy to announce the following presentation by Professor Shi-Yu Huang of National Tsing Hua University and National Institutes on Applied Research, Taiwan.

The presentation will take place on Wednesday 20 August, 11:30, in the Faculty Meeting room of the Computer Science department, Universitätsstr. 38, room 2.013 in the second floor.

Title: Speed Learning Scheme To Mitigate The Silent Data Corruption in a Multi-Core Design

Abstract:

Silent Data Corruption (SDC) has become a growing threat in large-scale infrastructures such as data centers, where massive fleets of multi-core systems operate under high utilization. As a fundamental element in computing platforms, multi-core System-on-Chip (SoC) architectures are particularly vulnerable to timing-related errors that silently propagate across workloads, especially in long-duration, mission-critical applications. In this work, we propose a Speed Margining Scheme to proactively mitigate SDC in multi-core systems. At the hardware level, we introduce an enhanced architecture based on Dual Module Redundancy (Enhanced DMR), in which each core supports two execution gears: a standard performance gear (Standard-Gear) and a shadow performance gear (Shadow-Gear). This arrangement facilitates the establishment of a safe performance margin (Safe Performance Margin) with built-in SDC alerting capability. In order to demonstrate the effectiveness of the proposed approach, we implemented it on both a ZYNQ™-7010 FPGA platform and an ASIC design using the TSMC 90nm process, running seven representative learning programs to emulate diverse system conditions. Experimental results show that the system can effectively detect early timing-related errors in real time and dynamically adjust frequency without interrupting system execution. This significantly reduces the incidence of silent data errors and enables multi-core systems to achieve high performance while ensuring computational accuracy.

Speaker's Biography:

Shi-Yu Huang received his B.S. and M.S. degrees in Electrical Engineering from National Taiwan University, respectively, and his Ph.D. degree in Electrical and Computer Engineering from the University of California, Santa Barbara. He joined the faculty of the Electrical Engineering Department, National Tsing Hua University, Taiwan, in 1999. Since Aug. 2024, he has taken a leave from the University to serve as Director General of STPI Center (Science and Technology Policy Research and Information Center) of the NIAR (National Institutes on Applied Research) in Taiwan.

Dr. Huang has published more than 180 refereed technical papers (including 70 journal papers). His research interests broadly cover VLSI design, automation, and testing, with prior experiences in formal verification, power estimation, fault diagnosis, and resilient nanometer SRAM Design. More recently, his research has concentrated on all-digital timing circuit designs, such as all-digital phase-locked loop (PLL), all-digital delay-locked loop (DLL), time-to-digital converter (TDC), and their applications to parametric fault testing and reliability enhancement for 3D-ICs.

Invited Talk by Shi-Yu Huang

To the top of the page