We derive an optimization framework for joint view and rate scalable coding of multi-view video content represented in the texture plus depth format. The optimization enables the sender to select the subset of coded views and their encoding rates such that the aggregate distortion over a continuum of synthesized views is minimized. We construct the view-rate scalable bitstream such that it delivers optimal performance simultaneously over a discrete set of transmission rates. In conjunction, we develop a user interaction model that characterizes the view selection actions of the client as a Markov chain over a discrete state-space. Our scheme outperforms the state-of-the-art H.264 SVC codec as well as a multi-view wavelet-based coder equipped with a uniform rate allocation strategy, across all scenarios studied. Finally, we observed that the interactivity-aware coding delivers superior performance over conventional allocation techniques that do not anticipate the client's view selection actions in their operation.