Publications

You can also find my publications on my Google Scholar profile.

Book Chapters

Efficient Depth Optimization in Quantum Addition and Modular Arithmetic with Ling Structure

Published in IFIP/IEEE International Conference on Very Large Scale Integration - System on a Chip, 2024

Improving the performance of quantum adder is an important technical challenge with major impact on the implementation of efficient, large-scale quantum computing. Continuing along this research direction, we propose a novel parallel-prefix quantum adder based on Ling expansion. We systematically explored classical structures for parallel-prefix adders assessing their suitability to be realized in quantum domain. Furthermore, Ling adder enforces Logical OR and large fan-out, which require innovative solutions. We addressed these challenges to realize the quantum Ling adder, which results in a T-depth of only O(log(n/2)). This represents a substantial improvement over the previous quantum adders based on parallel prefix structure, which require O(log n) T-depth. Based on the proposed adder, an efficient quantum modular adder is also demonstrated in this paper, further extending the applicability of our approach. We present extensive theoretical and simulation-based studies to establish our claims.

Recommended citation: Wang, S., Chattopadhyay, A. (2024). Efficient Depth Optimization in Quantum Addition and Modular Arithmetic with Ling Structure. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_4
Download Paper

Journal Articles

Optimal toffoli-depth quantum adder

Published in ACM Transactions on Quantum Computing, 2025

Efficient quantum arithmetic circuits are commonly found in numerous quantum algorithms of practical significance. To date, the logarithmic-depth quantum adders include a constant coefficient k ≥ 2 while achieving the Toffoli-Depth of k log n + 𝒪(1). In this work, 160 alternative compositions of the carry-propagation structure are comprehensively explored to determine the optimal depth structure for a quantum adder. By extensively studying these structures, it is shown that an exact Toffoli-Depth of log n + 𝒪(1) is achievable. This presents a reduction of Toffoli-Depth by almost 50% compared to the best known quantum adder circuits presented to date. We demonstrate a further possible design by incorporating a different expansion of propagate and generate forms, as well as an extension of the modular framework. Our article elaborates on these designs, supported by detailed theoretical analyses and simulation-based studies, firmly substantiating our claims of optimality within all possible configurations outlined in this work. The results also mirror similar improvements, recently reported in classical adder circuit complexity.

Recommended citation: Siyi Wang, Ankit Mondal, and Anupam Chattopadhyay. 2025. Optimal Toffoli-Depth Quantum Adder. ACM Transactions on Quantum Computing 6, 3, Article 25 (September 2025), 16 pages. https://doi.org/10.1145/3743691
Download Paper

Exact space-depth trade-offs in multicontrolled Toffoli decomposition

Published in Physical Review A, 2025

In this paper we consider the optimized implementation of multicontrolled Toffoli decomposition using the Clifford+𝑇 gate set. While there are several recent works in this direction, here we explicitly quantify the trade-off (with concrete formulas) between the Toffoli depth (this means the depth using the classical 2-controlled Toffoli gate) of the 𝑛-controlled Toffoli decomposition and the number of clean ancilla qubits. Additionally, we achieve a reduced Toffoli (consequently 𝑇) depth, which is an extension of the technique introduced by Khattar and Gidney [T. Khattar and C. Gidney, arXiv:2407.17966]. In terms of a negative result, we first show that by using such conditionally clean ancilla techniques, the Toffoli depth can never achieve exactly ⌈log2⁡𝑛⌉, though it remains of the same order. This highlights the limitation of the techniques exploiting conditionally clean ancillas [J. Nie et al., arXiv:2402.05053; T. Khattar and C. Gidney, arXiv:2407.17966]. Then we prove that, in a more general setup, the 𝑇 depth in the Clifford+𝑇 decomposition, via Toffoli gates, is lower bounded by ⌈log2⁡𝑛⌉, and this bound is achieved following the complete binary tree structure. Since the (2-controlled) Toffoli gate can further be decomposed using Clifford+𝑇 gate set, various methodologies are also explored in this regard for trade-off-related implications.

Recommended citation: Dutta, Suman, Siyi Wang, Anubhab Baksi, Anupam Chattopadhyay, and Subhamoy Maitra. "Exact space-depth trade-offs in multicontrolled Toffoli decomposition." Physical Review A 111, no. 5 (2025): 052611. doi: 10.1103/PhysRevA.111.052611
Download Paper

A comprehensive study of quantum arithmetic circuits

Published in Philosophical Transactions of the Royal Society A, 2025

In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared with their classical counterparts, with Shor’s algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen to develop novel designs and improve existing ones. In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state of the art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of the presented arithmetic circuits and suggest future research directions.

Recommended citation: Wang Siyi, Li Xiufan, Lee Wei Jie Bryan, Deb Suman, Lim Eugene and Chattopadhyay Anupam 2025A comprehensive study of quantum arithmetic circuitsPhil. Trans. R. Soc. A.38320230392 http://doi.org/10.1098/rsta.2023.0392
Download Paper

Efficient quantum circuits for machine learning activation functions including constant T-depth ReLU

Published in Physical Review Research, 2024

In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing 𝑇-depth. Specifically, we present novel implementations of ReLU and leaky ReLU activation functions, achieving constant 𝑇-depths of 4 and 8, respectively. Leveraging quantum lookup tables, we extend our exploration to other activation functions such as the sigmoid. This approach enables us to customize precision and 𝑇-depth by adjusting the number of qubits, making our results more adaptable to various application scenarios. This study represents a significant advancement towards enhancing the practicality and application of quantum machine learning.

Recommended citation: Zi, Wei, Siyi Wang, Hyunji Kim, Xiaoming Sun, Anupam Chattopadhyay, and Patrick Rebentrost. "Efficient quantum circuits for machine learning activation functions including constant T-depth ReLU." Physical Review Research 6, no. 4 (2024): 043048. http://doi.org/10.1103/PhysRevResearch.6.043048
Download Paper

A higher radix architecture for quantum carry-lookahead adder

Published in Scientific Reports, 2023

In this paper, we propose an efficient quantum carry-lookahead adder based on the higher radix structure. For the addition of two n-bit numbers, our adder uses O(n) − O(n/r) qubits and O(n) + O(n/r) T gates to get the correct answer in O(r) + O(log(n/r)) T-depth , where r is the radix. Quantum carry-lookahead adder has already attracted some attention because of its low T-depth. Our work further reduces the overall cost by introducing a higher radix layer. By analyzing the performance in T-depth, T-count, and qubit count, it is shown that the proposed adder is superior to existing quantum carry-lookahead adders. Even compared to the Draper out-of-place adder which is very compact and efficient, our adder is still better in terms of T-count.

Recommended citation: Wang, S., Baksi, A. & Chattopadhyay, A. A Higher radix architecture for quantum carry-lookahead adder. Sci Rep 13, 16338 (2023). https://doi.org/10.1038/s41598-023-41122-4
Download Paper

Conference Papers

Reducing T-Depth and T-Count in Quantum Multiplication Using Compressor Primitives

Published in Proceedings of the Great Lakes Symposium on VLSI 2025, 2025, 2025

Optimization of quantum multiplication is a critical area of study due to its pivotal role in quantum algorithms such as Shor’s factorization. Every time a quantum multiplier is used, it repeatedly executes several key components to perform the multiplication. Most current works have focused on using components such as the basic half and full-adder designs, which have limited efficiency. In this paper, we demonstrate that by using a generalized (m:k) compressor-based Wallace Tree one can significantly improve efficiency; this method achieves reductions of up to 92.8% in T-Depth and 55.6% in T-Count while maintaining a competitive Qubit-Count through brute-force and dynamic programming optimization.

Recommended citation: Siyi Wang, Suman Dutta, Wei Jie Bryan Lee, Jerrie Feng, Xiang Fang, and Anupam Chattopadhyay. 2025. Reducing T-Depth and T-Count in Quantum Multiplication Using Compressor Primitives. In Proceedings of the Great Lakes Symposium on VLSI 2025 (GLSVLSI ‘25). Association for Computing Machinery, New York, NY, USA, 35–40. https://doi.org/10.1145/3716368.3735184
Download Paper

A Novel Current Comparator Enabling Large RRAM Crossbars for BNNs and PUFs

Published in 2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC), 2024

Emerging non-volatile memory (NVM) device technologies are advancing in-memory computing (IMC) applications by providing faster computation speeds and reducing resource overhead. Crossbar structures are commonly used in IMC with NVM devices such as memristors to perform matrix-vector multiplication for deep learning and security primitive applications. Large crossbars are required to implement today’s deep learning models, especially for implementing specialized Binarized Neural Networks (BNNs) architectures and to construct security primitives such as physical unclonable functions (PUFs). To digitize crossbar currents, current-sense comparators based on current mirrors are widely used in BNN crossbars. However, conventional comparators have a limited current range due to the decrease of the bit-line voltage as the number of active crossbar elements connected to the bit-line increases. This paper presents a current comparator that employs a regulated cascode sensing stage which boosts the input current range by stabilizing the bit-line voltages. By increasing the current range, a larger crossbar size can be supported. Extensive simulations were carried out to verify the proposed technique, demonstrating that a significantly larger crossbar size can be achieved with the proposed comparator compared to a traditional current-sense comparator for the same area and resolution. Designed in a 180 nm technology, the proposed comparator achieves a resolution of 50nA and 100μA for PUF and BNN applications, respectively, and dissipates 198 μW.

Recommended citation: G. Rajendran, D. Basak, S. Deb, S. Wang and A. Chattopadhyay, "A Novel Current Comparator Enabling Large RRAM Crossbars for BNNs and PUFs," 2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC), Tanger, Morocco, 2024, pp. 1-6, doi: 10.1109/VLSI-SoC62099.2024.10767833.
Download Paper

Minimum Depth Quantum Modular Addition Through Carry-Save Architecture

Published in 2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC), 2024

Shor’s factorization algorithm, as one of the most significant achievements in quantum computing, exhibits an exponential speedup compared to the corresponding classical algorithm. In Shor’s factorization algorithm, modular exponentiation is one of the most computationally intensive components, which relies on the modular addition building block. This work aims to explore novel designs for enhancing the efficiency of quantum modular addition. In particular, we introduce a novel quantum modular addition framework based on carry-save architecture, which facilitates the conversion of multiple 2-addend quantum operations within modular addition into a single 3-addend operation, thereby reducing the computational depth. Compared to the most efficient existing quantum modular addition, our design has achieved an impressive result-a reduction in Toffoli Depth by up to 33.33%, while maintaining comparable Toffoli Count and Qubit Count. This research underscores the potential of carry-save architecture as a promising technique for accelerating quantum modular arithmetic as well as advancing the development of quantum computing in general.

Recommended citation: S. Wang, E. Lim, X. Li, J. Feng and A. Chattopadhyay, "Minimum Depth Quantum Modular Addition Through Carry-Save Architecture," 2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC), Tanger, Morocco, 2024, pp. 1-6, doi: 10.1109/VLSI-SoC62099.2024.10767796.
Download Paper

Quantum Implementation of Linear and Non-Linear Layers

Published in 2024 IEEE 37th International System-on-Chip Conference (SOCC), 2024

In this paper, we consider the problem of quantum implementation of the symmetric-key ciphers. The typical ciphers in this category have two main components, the linear and the non-linear layers, we consider both. The linear layer can be described as a non-singular matrix over binary operations. The in-place implementation is one of the main considerations for implementing such layer to minimize the number of qubits. We discuss the research works that have been done to this point on this subject matter and report the improved implementation of some of the matrices. Lists of our modifications include; making it more efficient (specially for the larger matrices), consideration for quantum depth (along with a randomized algorithm for optimization), etc. We report benchmarks for the ASCON and SHA-2 linear matrices. As for the non-linear layer, the constituent block is called an substitution box (an S-box, for short). Our “DORCIS” tool presented in this paper, finds a quantum circuit with an optimized depth for given S-boxes of size 3- and 4-bit. It is a follow-up work on LIGHTER-R (which is applicable for 4-bit S-boxes only) with multiple extensions. Unlike LIGHTER-R, our DORCIS takes a quantum decomposition based on Clifford and T gates. Also, both the full quantum depth and the T depth can be optimized by DORCIS. We compare our implementation with other optimized quantum circuits shown in the other research works and show that we find an implementation with the same cost metric, or find an implementation with lower cost metric, compared other tools proposed in the literature, apart from being simpler and more efficient.

Recommended citation: Baksi, A., Chakraborty, S., Chattopadhyay, A., Chun, M., Islam, S.H., Jang, K., Kim, H., Oh, Y., Roy, S., Seo, H. and Wang, S., 2024, September. Quantum Implementation of Linear and Non-Linear Layers. In 2024 IEEE 37th International System-on-Chip Conference (SOCC) (pp. 1-6). IEEE. doi: 10.1109/SOCC62300.2024.10737862.
Download Paper

Boosting the efficiency of quantum divider through effective design space exploration

Published in 2024 IEEE International Symposium on Circuits and Systems (ISCAS), 2024

Rapid progress in the design of scalable, robust quantum computing necessitates efficient quantum circuit implementation for algorithms with practical relevance. For several algorithms, arithmetic kernels, in particular, division plays an important role. In this manuscript, we focus on enhancing the performance of quantum slow dividers by exploring the design choices of its sub-blocks, such as, adders. Through comprehensive design space exploration of state-of-the-art quantum addition building blocks, our work have resulted in an impressive achievement: a reduction in Toffoli Depth of up to 93.90%, accompanied by substantial reductions in both Toffoli and Qubit Count of up to 92.12% and 99.38%, respectively. This paper offers crucial perspectives on efficient design of quantum dividers, and emphasizes the importance of adopting a systematic design space exploration approach.

Recommended citation: S. Wang, E. Lim and A. Chattopadhyay, "Boosting the Efficiency of Quantum Divider through Effective Design Space Exploration," 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore, Singapore, 2024, pp. 1-5, doi: 10.1109/ISCAS58744.2024.10557991.
Download Paper

POSTER: MalaQ-A Malware Against Quantum Computer

Published in 19th ACM Asia Conference on Computer and Communications Security, 2024

Quantum computers are set to revolutionize multiple application domains, including financial portfolio optimization, drug discovery, supply chain optimization, and cryptography, by offering algorithmic speed-up over the best-known classical algorithms. This large-scale adoption will also make quantum computers a lucrative target for cyber-criminals. However, to date, there has been minimal experimental study on the possible attack surfaces and the severity of attacks on a real quantum computer. In this work, we introduce MalaQ, a malware specifically developed for quantum computers. MalaQ exploits the classical-computer frontend of a quantum system and causes extensive damages like performance degradation and even complete failure of the quantum circuits. In this paper, we discuss the design, implementation, and experiments using MalaQ in great detail, including drawing parallels with prior works on both classical and quantum cyber-attacks.

Recommended citation: Wang, Siyi, Alex Jin, Suman Deb, Tarun Dutta, Manas Mukherjee, and Anupam Chattopadhyay. "POSTER: MalaQ-A Malware Against Quantum Computer." In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, pp. 1946-1948. 2024. https://doi.org/10.1145/3634737.3659432
Download Paper

Reducing depth of quantum adder using ling structure

Published in 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), 2023

Improving the performance of quantum adder is an important technical challenge with major impact on the implementation of efficient, large-scale quantum computing. Continuing along this research direction, we propose a novel parallel-prefix quantum adder based on Ling expansion. We systematically explored classical structures for parallel-prefix adders assessing their suitability to be realized in quantum domain. Furthermore, Ling adder enforces Logical OR and large fan-out, which require innovative solutions. We addressed these challenges to realize the quantum Ling adder, which results in a T-depth of only O(log (n/2)). This represents a substantial improvement over the previous quantum adders based on parallel prefix structure, which require O(log n) T-depth. We present extensive theoretical and simulation-based studies to establish our claims.

Recommended citation: S. Wang and A. Chattopadhyay, "Reducing Depth of Quantum Adder using Ling Structure," 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), Dubai, United Arab Emirates, 2023, pp. 1-6, doi: 10.1109/VLSI-SoC57769.2023.10321948.
Download Paper

Optimized quantum circuit implementation of payoff function

Published in 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), 2023

Large-scale quantum computers that can execute practical quantum algorithms have the potential to solve complex problems that are currently challenging for classical computers. This involves converting these problems into a form that can be processed by quantum circuits, a crucial process that requires minimizing quantum resources like qubit count, gate count, and circuit depth. Our work focuses on implementing and optimizing the foundational task of quantum finance, known as option pricing, as a quantum circuit. This enables the utilization of quantum computing benefits, within the financial domain. Specifically, we implement and optimize the function fK(S) = max(S−K, 0). Taking into consideration the significant trade-offs between qubit count and circuit depth, we have developed quantum circuits for the optimized implementation of the fK(S). Our work incorporates various optimization techniques for the circuit, such as selecting the optimal adder, optimizing the S−K operation, parallelization, and qubit reuse. Furthermore, we offer various versions of our quantum circuits for the fK(S), each featuring different adders and Toffoli decompositions, thereby providing flexibility for a wide range of use cases.

Recommended citation: S. Lim et al., "Optimized Quantum Circuit Implementation of Payoff Function," 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), Dubai, United Arab Emirates, 2023, pp. 1-6, doi: 10.1109/VLSI-SoC57769.2023.10321843.
Download Paper

Hardware trojan detection at lut: Where structural features meet behavioral characteristics

Published in 2022 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), 2022

This work proposes a novel hardware Trojan detection method that leverages static structural features and behavioral characteristics in field programmable gate array (FPGA) netlists. Mapping of hardware design sources to look-up-table (LUT) networks makes these features explicit, allowing automated feature extraction and further effective Trojan detection through machine learning. Four-dimensional features are extracted for each signal and a random forest classifier is trained for Trojan net classification. Experiments using Trust-Hub benchmarks show promising Trojan detection results with accuracy, precision, and F1-measure of 99.986%, 100%, and 99.769% respectively on average.

Recommended citation: L. Wu, X. Zhang, S. Wang and W. Hu, "Hardware Trojan Detection at LUT: Where Structural Features Meet Behavioral Characteristics," 2022 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), McLean, VA, USA, 2022, pp. 121-124, doi: 10.1109/HOST54066.2022.9840276.
Download Paper