TY - GEN
T1 - Generative adversarial networks for video prediction with action control
AU - Hu, Zhihang
AU - Wang, Jason T.L.
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - The ability of predicting future frames in video sequences, known as video prediction, is an appealing yet challenging task in computer vision. This task requires an in-depth representation of video sequences and a deep understanding of real-word causal rules. Existing approaches for tackling the video prediction problem can be classified into two categories: deterministic and stochastic methods. Deterministic methods lack the ability of generating possible future frames and often yield blurry predictions. On the other hand, although current stochastic approaches can predict possible future frames, their models lack the ability of action control in the sense that they cannot generate the desired future frames conditioned on a specific action. In this paper, we propose new generative adversarial networks (GANs) for stochastic video prediction. Our framework, called VPGAN, employs an adversarial inference model and a cycle-consistency loss function to empower the framework to obtain more accurate predictions. In addition, we incorporate a conformal mapping network structure into VPGAN to enable action control for generating desirable future frames. In this way, VPGAN is able to produce fake videos of an object moving along a specific direction. Experimental results show that a combination of VPGAN and pre-trained image segmentation models outperforms existing stochastic video prediction methods.
AB - The ability of predicting future frames in video sequences, known as video prediction, is an appealing yet challenging task in computer vision. This task requires an in-depth representation of video sequences and a deep understanding of real-word causal rules. Existing approaches for tackling the video prediction problem can be classified into two categories: deterministic and stochastic methods. Deterministic methods lack the ability of generating possible future frames and often yield blurry predictions. On the other hand, although current stochastic approaches can predict possible future frames, their models lack the ability of action control in the sense that they cannot generate the desired future frames conditioned on a specific action. In this paper, we propose new generative adversarial networks (GANs) for stochastic video prediction. Our framework, called VPGAN, employs an adversarial inference model and a cycle-consistency loss function to empower the framework to obtain more accurate predictions. In addition, we incorporate a conformal mapping network structure into VPGAN to enable action control for generating desirable future frames. In this way, VPGAN is able to produce fake videos of an object moving along a specific direction. Experimental results show that a combination of VPGAN and pre-trained image segmentation models outperforms existing stochastic video prediction methods.
KW - Cycle-consistency
KW - Deep learning
KW - Video prediction
UR - http://www.scopus.com/inward/record.url?scp=85090095722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090095722&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-56150-5_5
DO - 10.1007/978-3-030-56150-5_5
M3 - Conference contribution
AN - SCOPUS:85090095722
SN - 9783030561499
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 87
EP - 105
BT - Artificial Intelligence IJCAI 2019 International Workshops - Revised Selected Best Papers
A2 - El Fallah Seghrouchni, Amal
A2 - Sarne, David
PB - Springer
T2 - 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Y2 - 10 August 2019 through 12 August 2019
ER -