For memoryless sources, delayed side information at the decoder does not improve the rate-distortion function. However, this is not the case for more general sources with memory, as demonstrated by a number of works focusing on the special case of (delayed) feedforward. In this paper, a setting is studied in which the side information is delayed and the encoder is informed about the side information sequence. Assuming a hidden Markov model for the sources, at first, a single-letter characterization is given for the set-up where the side information delay is arbitrary and known at the encoder, and the reconstruction at the destination is required to be (near) lossless. Then, with delay equal to zero or one source symbol, a single-letter characterization is given of the rate-distortion function for the case where side information may be delayed or not, unbeknownst to the encoder. Finally, an example for a binary source is provided.