In the energy-constrained medium of video sensor networks, the objective of much research has been to statistically minimize the number of nodes that will achieve a sufficient degree of coverage. We consider increasing the number of nodes beyond the threshold of full coverage, and cooperatively filtering out the high level of redundant data in the video streams to minimize pernode capacity requirements. The scenario we study is that of a swarm of robots, all with wireless communication capabilities. Some of the robots are equipped with video cameras and are thus considered sensors. A few select robots have sufficient battery and computational power to perform machine vision processing of the video stream. The goal of this scenario is to get the video from the sensors to the video-processing robots, which can then extract high-level surveillance information about the observed environment. We present an optimization framework for minimizing redundant visual data transmissions, while maximizing the throughput from sensors to processing nodes. We also characterize through simulation the performance gain on the sensor network as the video coverage increases.