We consider a drone-based vision sensor network that captures collocated viewpoints of the scene underneath and sends them to a remote user for volumetric 360-degree navigable visual immersion on his virtual reality head-mounted display. The reconstruction quality of the immersive scene representation on the device and thus the quality of user experience will depend on the signal sampling rate and location of each drone. Moreover, there is a limit on the aggregate amount of data the network can sample and relay towards the user, stemming from transmission constraints. Finally, the user navigation actions will dynamically place different priorities on specific viewpoints of the captured scene. We make multiple contributions in this context. First, we formulate the viewpoint-priority-aware scene reconstruction error as a function of the assigned sampling rates and compute their optimal values that minimize the former, for given drone positions and system constraints. Second, we design an online view sampling policy that takes actions while exploring new drone locations to discover the best drone network configuration over the scene. We characterize its approximation versus convergence characteristics using novel spectral graph analysis and show considerable advances relative to the state-of-the-art. Finally, to enable the drone sensors to efficiently communicate their data back to the aggregation point, we formulate computationally efficient ratedistortion-power optimized transmission scheduling policies that meet the low-latency application requirements, while conserving the available energy. Our experimental results demonstrate the competitive advantages of our approach over multiple performance factors. This is a first-of-its-kind study of an emerging application of prospectively broad societal impact.