77. Approximation-Aware and Quantization-Aware Training for Graph Neural Networks. Rodion Novkin; Florian Klemme and Hussam Amrouch. IEEE Transactions on Computers (November 2023).
BibTeX
76. Robust Pattern Generation for Small Delay Faults under Process Variations. Hanieh Jafarzadeh; Florian Klemme; Jan Dennis Reimer; Zahra Paria Najafi-Haghi; Hussam Amrouch; Sybille Hellebrand and Hans-Joachim Wunderlich. In Proceedings of the IEEE International Test Conference (ITC’23), Disneyland, Anaheim, USA, 2023.
Abstract
Small Delay Faults (SDFs) introduce additional delays
smaller than the capture time and require timing-aware test pattern
generation. Since process variations can invalidate the effectivity of
such patterns, different circuit instances may show a different fault
coverage. This paper presents a method to generate test pattern sets for
SDFs which are valid for all circuit timings. The method overcomes the
complexity limitations of known timing-aware Automatic Test Pattern
Generation (ATPG) which has to use fault sampling under process
variations. A statistical learning scheme maximises the coverage of SDFs
in an entire circuit population following the variation parameters of a
calibrated industrial FinFET transistor model. The method combines
efficient ATPG for Transition Faults with fast timing-aware fault
simulation on GPUs. Experiments show that the size of the pattern set is
significantly reduced in comparison to standard N-detect while the fault
coverage even increases compared to both N-detect and timing-aware ATPGBibTeX
75. Exploiting the Error Resilience of the Preconditioned Conjugate Gradient Method for Energy and Delay Optimization. Natalia Lylina; Stefan Holst; Hanieh Jafarzadeh; Alexandra . Kourfali and Hans-Joachim Wunderlich. In IEEE 29st International On-Line Testing Symposium (IOLTS`23), 2023, pp. 1–6.
Abstract
The Preconditioned Conjugate Gradient (PCG)
method is well-established for solving linear equations systems.
Running the PCG method on a hardware accelerator ensures
fast and efficient computation. At the same time, each hardware
accelerator may be slightly different due to process variability or
aging. To handle the variability, a rather pessimistic frequency
selection for the whole population of accelerators is often utilized.
Increasing the frequency may improve the performance but
may also increase the risk of computational errors, affect the
convergence of PCG or even corrupt the PCG results.
In this paper, we present a method to determine the frequency
for each hardware accelerator instance which optimizes the
execution time and the energy efficiency of the PCG method.
First, a technique is presented to analyze the error resilience of
a PCG algorithm to overclocking. Based on the analysis results,
we increase the frequency to speed up the convergence while
keeping the error rate below the required threshold.BibTeX
74. Guardband Optimization for the Preconditioned Conjugate Gradient
Algorithm. Natalia Lylina; Stefan Holst; Hanieh Jafarzadeh; Alexandra Kourfali and Hans-Joachim Wunderlich. In International Conference on Dependable Systems and Networks
(DSN’23), 2023.
Abstract
Many applications from Artificial Intelligence (AI)
and Scientific Computing rely on efficient algorithms for solving
large systems of linear equations. The Preconditioned Conjugate
Gradient (PCG) algorithm is a promising option and it is
a perfect candidate to be executed on specialized hardware
accelerators widely used in AI. Hardware accelerators, like other
modern devices, are prone to process variations. A conventional
approach to handle the variability is to use pessimistic guard�bands for
all the devices within the population, which implies
that the best and even the average accelerators are slowed down
significantly. Since the PCG algorithm is inherently error resilient
to some extent, it may also tolerate an error rate increase due
to overclocking. On another side, increasing the frequency may
increase the total execution time if more arithmetic operations are
needed until the convergence. This paper presents a method to
ensure efficient computing on each hardware accelerator instance
running the PCG algorithm. A cross-layer approach identifies an
optimized frequency that minimizes the total time to complete
the PCG algorithm. Simple high-level checks ensure the quality
of the solution. Experimental results validate the feasibility of
the developed approach for large systems of linear equations.BibTeX
73. Automating Greybox System-Level Test Generation. Denis Schwachhofer; Maik Betka; Steffen Becker; Stefan Wagner; Matthias Sauer and Ilia Polian. In
2023 IEEE European Test Symposium (ETS), 2023. DOI:
https://doi.org/10.1109/ets56758.2023.10173985 BibTeX
72. Synthesis of IJTAG Networks for Multi-Power Domain Systems on Chips. Payam Habiby; Natalia Lylina; Chih-Hao Wang; Hans-Joachim Wunderlich; Sebastian Huhn and Rolf Drechsler. In Proceedings of the 28th IEEE European Test Symposium 2023 (ETS’ 23), Venice, Italy, 2023, pp. 6.
Abstract
The high-volume manufacturing test ensures the
production of defect-free devices, which is of utmost importance
when dealing with safety-critical systems. Such a high-quality
test requires a deliberately designed scan network to provide a
time and cost-effective access to many on-chip components, as
included in state-of-the-art chip designs. The IEEE 1687 Std.
(IJTAG) has been introduced to tackle this challenge by adding
programmable components that enables the design of reconfig-
urable scan networks. Although these networks reduce the test
time by shortening the scan chains’ lengths, the reconfiguration
process itself incurs an additional time overhead. This paper
proposes a heuristic method for designing customized multi-power
domain reconfigurable scan networks with a minimized overall
reconfiguration time. More precisely, the proposed method exploits
a-priori given non-functional properties of the system, such as the
power characteristics and the instruments’ access requirements.
For the first time, these non-functional properties are considered
to synthesize a well-adjusted and highly efficient multi-power
domain network. The experimental results show a considerable
improvement over the reported benchmark networks.BibTeX
71. A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors. Jens Anders; Pablo Andreu; Bernd Becker; Steffen Becker; Riccardo Cantoro; Nikolaos I. Deligiannis; Nourhan Elhamawy; Tobias Faller; Carles Hernandez; Nele Mentens; Mahnaz Namazi Rizi; Ilia Polian; Abolfazl Sajadi; Mathias Sauer; Denis Schwachhofer; Matteo Sonza Reorda; Todor Stefanov; Ilya Tuzov; Stefan Wagner and Nusa Zidaric. In
2023 IEEE European Test Symposium (ETS), 2023. DOI:
https://doi.org/10.1109/ets56758.2023.10174099 BibTeX
70. HDGIM: Hyperdimensional Genome Sequence Matching on Unreliable Highly-Scaled FeFET. Hamza Errahmouni Barkam; Sanggeon Yun; Paul R. Genssler; Zhuowen Zou; Che-Kai Liu; Hussam Amrouch and Mohsen Imani. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE’23), Antwerp, Belgium, 2023.
BibTeX
69. Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level. Behnaz Ranjbar; Florian Klemme; Paul R. Genssler; Hussam Amrouch; Jinhyo Jung; Shail Dave; Hwisoo So; Kyongwoo Lee; Aviral Shrivastava; Ji-Yung Lin; Pieter Weckx; Subrat Mishra; Francky Catthoor; Dwaipayan Biswas and Akash Kumar. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE’23), Antwerp, Belgium, 2023.
BibTeX
68. Upheaving Self-Heating Effects from Transistor to Circuit Level using Conventional EDA Tool Flows. Florian Klemme; Sami Salamin and Hussam Amrouch. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE’23), Antwerp, Belgium, 2023.
BibTeX
67. Robust Resistive Open Defect Identification Using Machine Learning with Efficient Feature Selection. Zahra Paria Najafi-Haghi; Florian Klemme; Hanieh Jafarzadeh; Hussam Amrouch and Hans-Joachim Wunderlich. In Proceedings of the IEEE Conference on Design, Automation & Test in Europe (DATE’23), Antwerp, Belgium, 2023.
BibTeX
66. A Complete Design-for-Test Scheme for Reconfigurable Scan Networks. Natalia Lylina; Chih-Hao Wang and Hans-Joachim Wunderlich.
Journal of Electronic Testing: Theory and Applications (JETTA) (January 2023), pp. 1--19. DOI:
https://doi.org/10.1007/s10836-022-06038-3 Abstract
Reconfigurable Scan Networks (RSNs) are widely used for accessing instruments offline during debug, test and validation, as well as for performing sys-tem-level-test and online system health monitoring. The correct operation of RSNs is essential, and RSNs have to be thoroughly tested. However, due to their inherently sequential structure and complex control dependencies, large parts of RSNs have limited observability and controllability. As a result, certain faults at the interfaces to the instruments, control primitives and scan segments remain undetected by existing test methods. In the paper at hand, Design-for-test (DfT) schemes are developed to overcome the testability problems e.g. by resynthesizing the initial design. A DfT scheme for RSNs is presented, which allows detecting all single stuck-at-faults in RSNs by using existing test generation techniques. The developed scheme analyzes and ensures the testability of all parts of RSNs, which include scan segments, control primitives, and interfaces to the instruments. Therefore, the developed scheme is referred to as a complete DfT scheme. It allows for a test integration to cover multiple fault locations can with a single efficient test sequence and to reduce overall test cost.BibTeX
65. Beyond von Neumann Era: Brain-inspired Hyperdimensional Computing to the Rescue. Hussam Amrouch; Paul R. Genssler; Mohsen Imani; Mariam Issa; Xun Jiao; Wegdan Ali Mohammadin; Gloria Sepanta and Ruixuan Wang. In 28th Asia and South Pacific Design Automation Conference (ASP-DAC), 2023.
BibTeX
64. ML to the Rescue: Reliability Estimation from Self-Heating and Aging in Transistors all the Way up Processors. Hussam Amrouch and Florian Klemme. In 28th Asia and South Pacific Design Automation Conference (ASP-DAC), 2023.
BibTeX
63. Modeling and Predicting Transistor Aging under Workload Dependency using Machine Learning. Paul R. Genssler; Hamza E. Barkam; Karthik Pandaram; Mohsen Imani and Hussam Amrouch. (2023). DOI:
https://doi.org/10.1109/TCSI.2023.3289325 BibTeX
62. Stress-resiliency of AI implementations on FPGAs. Jonas Krautter; Paul R. Genssler; Gloria Sepanta; Hussam Amrouch and Mehdi Tahoori. In International Conference on Field Programmable Logic and Applications (FPL), 2023.
BibTeX
61. Blood Glucose Prediction for Type-1 Diabetics using Deep Reinforcement Learning. Peter Domanski; Aritra Ray; Farshad Firouzi; Kyle Lafata; Krishnendu Chakrabarty and Dirk Pflüger. In 2023 IEEE International Conference on Digital Health (ICDH), 2023, pp. 339--347.
BibTeX
60. Identifying Resistive Open Defects in Embedded Cells under Variations. Zahra Paria Najafi-Haghi and Hans-Joachim Wunderlich.
Journal of Electronic Testing: Theory and Applications (JETTA) (2023), pp. 1–27. DOI:
https://doi.org/10.1007/s10836-023-06044-z Abstract
Small Delay Faults (SDFs) due to weak defects and
marginalities have to be distinguished from extra delays due to process
variations, since they may form a reliability threat even if the
resulting timing is within the specification. In this paper, it is shown
that these faults can still be identified, even if the corresponding
defect cell is deeply embedded into a combinational circuit and its
observability is restricted.
The results of a few delay tests at different voltages and frequencies
serve as the input to machine learning procedures which can classify a
circuit as marginal due to defects or just slow due to variations.
Several machine learning techniques are investigated and compared with
respect to accuracy, precision, and recall for different circuit sizes
and defect scales. The classification strategies are powerful enough to
sort out defective devices without a major impact on yield.BibTeX
59. Challenges in Machine Learning Techniques to Estimate Reliability from Transistors to Circuits. Victor van Santen; Florian Klemme; Paul R. Genssler and Hussam Amrouch. In IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2023.
BibTeX
58. Convolutions in Overdrive: Maliciously Secure Convolutions for MPC. Marc Rivinius; Pascal Reisert; Sebastian Hasler and Ralf Küsters. Cryptology ePrint Archive.2023.
Abstract
Machine learning (ML) has seen a strong rise in popularity in recent years and has become an essential tool for research and industrial applications. Given the large amount of high quality data needed and the often sensitive nature of ML data, privacy-preserving collaborative ML is of increasing importance. In this paper, we introduce new actively secure multiparty computation (MPC) protocols which are specially optimized for privacy-preserving machine learning applications. We concentrate on the optimization of (tensor) convolutions which belong to the most commonly used components in ML architectures, especially in convolutional neural networks but also in recurrent neural networks or transformers, and therefore have a major impact on the overall performance. Our approach is based on a generalized form of structured randomness that speeds up convolutions in a fast online phase. The structured randomness is generated with homomorphic encryption using adapted and newly constructed packing methods for convolutions, which might be of independent interest. Overall our protocols extend the state-of-the-art Overdrive family of protocols (Keller et al., EUROCRYPT 2018). We implemented our protocols on-top of MP-SPDZ (Keller, CCS 2020) resulting in a full-featured implementation with support for faster convolutions. Our evaluation shows that our protocols outperform state-of-the-art actively secure MPC protocols on ML tasks like evaluating ResNet50 by a factor of 3 or more. Benchmarks for depthwise convolutions show order-of-magnitude speed-ups compared to existing approaches.BibTeX
57. Reliable Hyperdimensional Reasoning on Unreliable Emerging Technologies. Hamza Errahmouni Barkam; Sanggeon Yun; Paul R. Genssler; Hanning Chen; Albi Mema; Andrew Ding; Hussam Amrouch and Mohsen Imani. In 2023 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2023.
BibTeX
56. Learn to Tune: Robust Performance Tuning in Post-Silicon Validation. Peter Domanski; Dirk Pflüger and Raphaël Latty. In 2023 IEEE European Test Symposium (ETS), 2023, pp. 1--4.
BibTeX
55. Overdrive LowGear 2.0: Reduced-Bandwidth MPC without Sacrifice. Sebastian Hasler; Toomas Krips; Ralf Küsters; Pascal Reisert and Marc Rivinius. Cryptology ePrint Archive.2023.
Abstract
Some of the most efficient protocols for Multi-Party Computation (MPC) follow a two-phase approach where correlated randomness, in particular Beaver triples, is generated in the offline phase and then used to speed up the online phase. Recently, more complex correlations have been introduced to optimize certain operations even further, such as matrix triples for matrix multiplications. In this paper, our goal is to improve the efficiency of the triple generation in general and in particular for classical field values as well as matrix operations. To this end, we modify the Overdrive LowGear protocol to remove the costly sacrificing step and therewith reduce the round complexity and the bandwidth. We extend the state-of- the-art MP-SPDZ implementation with our new protocols and show that the new offline phase outperforms state-of-the-art protocols for the generation of Beaver triples and matrix triples. For example, we save 33 % in bandwidth compared to Overdrive LowGear.BibTeX
54. SyncTREE: Fast Timing Analysis for Integrated Circuit Design through a Physics-informed Tree-based Graph Neural Network. Yuting Hu; Jiajie Li; Florian Klemme; Gi-Joon Nam; Tengfei Ma; Hussam Amrouch and Jinjun Xiong. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
BibTeX
53. Tutorial: The Synergy of Hyperdimensional and In-memory Computing. Paul R. Genssler; Simon Thomann and Hussam Amrouch. In International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS ’23 Companion), 2023.
BibTeX
52. Cryogenic Embedded System to Support Quantum Computing: From 5nm FinFET to Full Processor. Paul R. Genssler; Florian Klemme; Shivendra Singh Parihar; Sebastian Brandhofer; Girish Pahwa; Ilia Polian; Yogesh Singh Chauhan and Hussam Amrouch.
IEEE Transactions on Quantum Engineering (2023). DOI:
https://doi.org/10.1109/TQE.2023.3300833 BibTeX
51. FDSOI-based Analog Computing for Ultra-efficient Hamming Distance Similarity Calculation. Albi Mema; Simon Thomann; Paul R. Genssler and Hussam Amrouch.
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I) (2023). DOI:
https://doi.org/10.1109/TCSI.2023.3267837 BibTeX
50. Multipars: Reduced-Communication MPC over Z2k. Sebastian Hasler; Pascal Reisert; Marc Rivinius and Ralf Küsters. Cryptology ePrint Archive.2023.
Abstract
In recent years, actively secure SPDZ-like protocols for dishonest majority, like SPDZ2k, Overdrive2k, and MHz2k, over base rings Z2k have become more and more efficient. In this paper, we present a new actively secure MPC protocol Multipars that outperforms these state-of-the-art protocols over Z2k by more than a factor of 2 in the two-party setup in terms of communication. Multipars is the first actively secure N-party protocol over Z2k that is based on linear homomorphic encryption (LHE) in the offline phase (instead of oblivious transfer or somewhat homomorphic encryption in previous works). The strong performance of Multipars relies on a new adaptive packing for BGV ciphertexts that allows us to reduce the parameter size of the encryption scheme and the overall communication cost. Additionally, we use modulus switching for further size reduction, a new type of enhanced CPA security over Z2k, a truncation protocol for Beaver triples, and a new LHE-based offline protocol without sacrificing over Z2k.
We have implemented Multipars and therewith provide the fastest preprocessing phase over Z2k. Our evaluation shows that Multipars offers at least a factor of 8 lower communication costs and up to a factor of 10.2 faster runtime in the WAN setting compared to the currently best available MPC implementation over Z2k.BibTeX
49. Convolutions in Overdrive: Maliciously Secure Convolutions for MPC. Marc Rivinius; Pascal Reisert; Sebastian Hasler and Ralf Küsters.
Proceedings on Privacy Enhancing Technologies 2023, 3 (2023), pp. 321--353. DOI:
https://doi.org/10.56553/popets-2023-0084 Abstract
Machine learning (ML) has seen a strong rise in popularity in recent years and has become an essential tool for research and industrial applications. Given the large amount of high quality data needed and the often sensitive nature of ML data, privacy-preserving collaborative ML is of increasing importance. In this paper, we introduce new actively secure multiparty computation (MPC) protocols which are specially optimized for privacy-preserving machine learning applications. We concentrate on the optimization of (tensor) convolutions which belong to the most commonly used components in ML architectures, especially in convolutional neural networks but also in recurrent neural networks or transformers, and therefore have a major impact on the overall performance. Our approach is based on a generalized form of structured randomness that speeds up convolutions in a fast online phase. The structured randomness is generated with homomorphic encryption using adapted and newly constructed packing methods for convolutions, which might be of independent interest. Overall our protocols extend the state-of-the-art Overdrive family of protocols (Keller et al., EUROCRYPT 2018). We implemented our protocols on-top of MP-SPDZ (Keller, CCS 2020) resulting in a full-featured implementation with support for faster convolutions. Our evaluation shows that our protocols outperform state-of-the-art actively secure MPC protocols on ML tasks like evaluating ResNet50 by a factor of 3 or more. Benchmarks for depthwise convolutions show order-of-magnitude speed-ups compared to existing approaches.BibTeX
48. Overdrive LowGear 2.0: Reduced-Bandwidth MPC without Sacrifice. Pascal Reisert; Marc Rivinius; Toomas Krips and Ralf Küsters. In
ACM ASIA Conference on Computer and Communications Security (ASIA CCS 2023), 2023, pp. 372–386. DOI:
https://doi.org/10.1145/3579856.3582809 Abstract
Some of the most efficient protocols for Multi-Party Computation (MPC) follow a two-phase approach where correlated randomness, in particular Beaver triples, is generated in the offline phase and then used to speed up the online phase. Recently, more complex correlations have been introduced to optimize certain operations even further, such as matrix triples for matrix multiplications. In this paper, our goal is to improve the efficiency of the triple generation in general and in particular for classical field values as well as matrix operations. To this end, we modify the Overdrive LowGear protocol to remove the costly sacrificing step and therewith reduce the round complexity and the bandwidth. We extend the state-of- the-art MP-SPDZ implementation with our new protocols and show that the new offline phase outperforms state-of-the-art protocols for the generation of Beaver triples and matrix triples. For example, we save 33 % in bandwidth compared to Overdrive LowGear.BibTeX