Data-intensive applications frequently rely on multicore computer systems, in which Non-Uniform Memory Access (NUMA) is a dominant architecture. To transfer data into and out from these high-performance computers becomes a bottleneck, and thus it is crucial to understand their I/O performance characteristics. However, the complexity in NUMA architecture presents a new challenge in modeling its I/O access cost, and thus lead to difficulties in configuring proper processor and memory affinity. In this paper, we show that existing NUMA experimental methods and metrics are inappropriate on contemporary highend systems. We characterize a state-of-the-art NUMA host, and propose, to the best of our knowledge, the first methodology to simulate I/O operations using memory semantics, and model the I/O bandwidth performance. Our methodology is thoroughly tested and validated by mapping multiple parallel I/O streams to different sets of hardware components (CPU, memory, network cards, and SSDs) and by measuring the performance of each mapping. The experimental results and analysis reveal that our methodology can dramatically reduce characterization workload, accurately estimate the overall I/O performance, and effectively mitigate resource contention among I/O tasks.