In this work, we first propose a deep depthwise Convolutional Neural Network (CNN) structure, called Add-Net, which uses binarized depthwise separable convolution to replace conventional spatial-convolution. In Add-Net, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition operations. We meticulously investigate and analyze the Add-Net's performance (i.e. accuracy, parameter size and computational cost) in object recognition application compared to traditional baseline CNN using the most popular large scale ImageNet dataset. Accordingly, we propose a Depthwise CNN In-Memory Accelerator (DIMA) based on SOT-MRAM computational sub-arrays to efficiently accelerate Add-Net within non-volatile MRAM. Our device-to-architecture co-simulation results show that, with almost the same inference accuracy to the baseline CNN on different data-sets, DIMA can obtain ∼1.4× better energy-efficiency and 15.7× speedup compared to ASICs, and, ∼1.6× better energy-efficiency and 5.6× speedup over the best processing-in-DRAM accelerators.