In scanning x-ray microscopy, Differential Phase Contrast (DPC) imaging is a technique to image the phase contrast information. It is based on the concept that locally a sample can be considered as a prism, which deflects the incident x-ray beam slightly in angle. Many efforts have been made in DPC imaging and a number of representative cases at a moderate spatial resolution have demonstrated the success of the method. However, the inherent limitations of those methods prevent DPC imaging from ultra-high spatial resolution imaging applications. A highly robust approach to DPC imaging based on Fourier-shift fitting was proposed recently. This method is effective in reconstructing the buried nanoscale interfacial structures. Because of the non-linear fitting and Fourier transformation operations involved in the algorithm, the computation is intensive at each scanning point. One challenge in this method is to make it fast enough to keep up with pixel-wise scanning, so that real-time data processing can be achieved. Here we provided three implementations in Matlab, Python and C++ and compared their speed performance. Experiments show that the C++ version is about one order of magnitude faster than the Matlab version and nearly two orders of magnitude faster than the Python version. In addition, we designed a parallel algorithm to divide the task into a number of independently running routines executing on a batch-queue based multi-core servers cluster and achieved almost another two orders of magnitude improvement.