In this work we present the first full and complete evaluation of a very large multiplication scheme in custom hardware. We designed a novel architecture to realize a million-bit multiplication architecture based on the Schönhage-Strassen Algorithm and the Number Theoretical Transform (NTT). The construction makes use of an innovative cache architecture along with processing elements customized to match the computation and access patterns of the FFT-based recursive multiplication algorithm. When synthesized using a 90nm TSMC library operating at a frequency of 666 MHz, our architecture is able to compute the product of integers in excess of a million bits in 7.74 milliseconds. Estimates show that the performance of our design matches that of previously reported software implementations on a high-end 3 Ghz Intel Xeon processor, while requiring only a tiny fraction of the area.