FPGAs (Field-Programmable Gate Arrays) are often used as coprocessors to boost the performance of data-intensive applications [1, 2], However, mapping algorithms onto multimillion-gate FPGAs is time consuming and remains a challenge in configurable system design. The communication overhead between the host workstation and the FPGAs is also significant. To address these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Hierarchical Instruction Set Architecture (HISA). At each level, HISA instructions are classified into communication instructions or computation instructions. The former are executed by the local controller while the latter are issued to the lower level for execution. Additionally, by using a memory switching scheme and the high-level HISA set to partition the application into coarse-grain tasks, the host-FPGA communication overhead can be hidden. We enlist matrix multiplication (MM) to test the effectiveness of H-SIMD. The test results show sustained high performance.