Platform FPGA devices are an attractive option for implementing parallel systems on a single chip that can be used as a coprocessor. However, the substantial communication overhead between the host workstation and the FPGAs is a major performance bottleneck. Also, mapping an application to FPGAs still remains a daunting job. To address these problems, this paper describes a Hierarchical SIMD (H-SIMD) machine design with its codesign of a Hierarchical Instruction Set Architecture (HISA). Our proposed H-SIMD design uses an FPGA controller to facilitate ease of program development. It also employs a memory switching scheme to overlap communications with computations as much as possible. The 2-dimensional Discrete Cosine Transform (DCT2 or 2D-DCT) is enlisted to show the effectiveness of the H-SIMD machine.