As quantum computers become more affordable and commonplace, existing security systems that are based
on classical cryptographic primitives, such as RSA and Elliptic Curve Cryptography (ECC), will no longer
be secure. Hence, there has been interest in designing post-quantum cryptographic (PQC) schemes,
such as those based on lattice-based cryptography (LBC). The potential of LBC schemes is evidenced
by the number of such schemes passing the selection of NIST PQC Standardization Process Round-3. One
such scheme is the Crystals-Dilithium signature scheme, which is based on the hard module-lattice problem. However, there is no efficient implementation of the Crystals-Dilithium signature scheme. Hence, in
this article, we present a compact hardware architecture containing elaborate modular multiplication units
using the Karatsuba algorithm along with smart generators of address sequence and twiddle factors for
NTT, which can complete polynomial addition/multiplication with the parameter setting of Dilithium in
a short clock period. Also, we propose a fast software/hardware co-design implementation on Field Programmable Gate Array (FPGA) for the Dilithium scheme with a tradeoff between speed and resource utilization. Our co-design implementation outperforms a pure C implementation on a Nios-II processor of the
platform Altera DE2-115, in the sense that our implementation is 11.2 and 7.4 times faster for signature and