We consider UAV IoT aerial sensing that delivers multiple VR/AR immersive communication sessions to remote users. The UAV swarm is spatially distributed over a wide area of interest, and each UAV captures a viewpoint of the scene below it. The remote users are interested in visual immersive navigation of specific subareas/scenes of interest, reconstructed on their respective VR/AR devices from the captured data. The reconstruction quality of the immersive scene representations at the users will depend on the sampling/sensing rates associated with each UAV. There is a limit on the aggregate amount of data that the UAV swarm can sample and send towards the users, stemming from physical/transmission capacity constraints. Similarly, each VR/AR application has minimum reconstruction quality requirements for its own session. We propose an optimization framework that makes three contributions in this context. First, we select the optimal sampling rates to be used by each UAV, such that the system and application constraints are not exceed, while the priority weighted reconstruction quality across all VR/AR sessions is maximized. Then, we design an optimal scalable source-channel signal representation that instills into the captured data inherent rate adaptivity, unequal error protection, and minimum required redundancy. Finally, the UAV transmission efficiency is enhanced by the use of small-form-factor multi-beam directional antennas and optimal power/link scheduling across the scalable signal representation layers. Our experiments demonstrate competitive advantages over conventional methods for visual sensing. This is a first-of-its-kind study of an emerging application of prospectively broad societal impact.