Abstract
Human drivers consider past and future driving environments to maintain stable control of a vehicle. To adopt a human driver’s behavior, we propose a vision-based autonomous driving model, called Future Actions and States Network (FASNet), which uses predicted future actions and generated future states in multi-task learning manner. Future states are generated using an enhanced deep predictive-coding network and motion equations defined by the kinematic vehicle model. The final control values are determined by the weighted average of the predicted actions for a stable decision. With these methods, the proposed FASNet has a high generalization ability in unseen environments. To validate the proposed FASNet, we conducted several experiments, including ablation studies in realistic three-dimensional simulations. FASNet achieves a higher Success Rate (SR) on the recent CARLA benchmarks under several conditions as compared to state-of-the-art models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the Kitti dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Bojarski, M., et al.: End to end learning for self-driving cars. ar**v preprint ar**v:1604.07316 (2016)
Hecker, S., Dai, D., Van Gool, L.: End-to-end learning of driving models with surround-view cameras and route planners. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 449–468. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_27
Huang, Z., Zhang, J., Tian, R., Zhang, Y.: End-to-end autonomous driving decision based on deep reinforcement learning. In: 2019 5th International Conference on Control, Automation and Robotics (ICCAR), pp. 658–662. IEEE (2019)
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE (2018)
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9329–9338 (2019)
Liang, X., Wang, T., Yang, L., **ng, E.: CIRL: controllable imitative reinforcement learning for vision-based self-driving. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 604–620. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_36
Sauer, A., Savinov, N., Geiger, A.: Conditional affordance learning for driving in urban environments. ar**v preprint ar**v:1806.06498 (2018)
Wang, Q., Chen, L., Tian, B., Tian, W., Li, L., Cao, D.: End-to-end autonomous driving: An angle branched network approach. IEEE Trans. Veh. Technol.(2019)
Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. ar**v preprint ar**v:1912.12294 (2019)
Li, Z., Motoyoshi, T., Sasaki, K., Ogata, T., Sugano, S.: Rethinking self-driving: Multi-task knowledge for better generalization and accident explanation ability. ar**v preprint ar**v:1809.11100 (2018)
Chowdhuri, S., Pankaj, T., Zipser, K.: Multinet: Multi-modal multi-task learning for autonomous driving. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1496–1504. IEEE (2019)
Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. ar**v preprint ar**v:1605.08104 (2016)
Kong, J., Pfeiffer, M., Schildbach, G., Borrelli, F.: Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1094–1099. IEEE (2015)
Zhang, Y., Yang, Q.: A survey on multi-task learning. ar**v preprint ar**v:1707.08114 (2017)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2174–2182 (2017)
Chi, L., Mu, Y.: Deep steering: Learning end-to-end driving model from spatial and temporal visual cues. ar**v preprint ar**v:1708.03798 (2017)
Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.: Learning situational driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11296–11305 (2020)
Yu, A., Palefsky-Smith, R., Bedi, R.: Deep reinforcement learning for simulated autonomous vehicle control, pp. 1–7. Course Project Reports, Winter (2016)
Toromanoff, M., Wirbel, E., Moutarde, F.: End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7153–7162 (2020)
Tai, L., Yun, P., Chen, Y., Liu, C., Ye, H., Liu, M.: Visual-based autonomous driving deployment from a stochastic and uncertainty-aware perspective. ar**v preprint ar**v:1903.00821 (2019)
Yang, Z., Zhang, Y., Yu, J., Cai, J., Luo, J.: End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2289–2294. IEEE (2018)
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. ar**v preprint ar**v:1511.05440 (2015)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Liang, X., Lee, L., Dai, W., **ng, E.P.: Dual motion GAN for future-flow embedded video prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1744–1752 (2017)
Wei, H., Yin, X., Lin, P.: Novel video prediction for large-scale scene using optical flow. ar**v preprint ar**v:1805.12243 (2018)
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2017)
Du, L., Zhao, Z., Su, F., Wang, L., An, C.: Jointly predicting future sequence and steering angles for dynamic driving scenes. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4070–4074. IEEE (2019)
**, X., et al.: Predicting scene parsing and motion dynamics in the future. In: Advances in Neural Information Processing Systems, pp. 6915–6924 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. (2016)
Song, G., Chai, W.: Collaborative learning for deep neural networks. In: Advances in Neural Information Processing Systems, pp. 1832–1841 (2018)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. ar**v preprint ar**v:1711.03938 (2017)
felipecode: Carla 0.8.4 data collector (2018). https://github.com/carla-simulator/data-collector
Acknowledgments
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2014-0-00059, Development of Predictive Visual Intelligence Technology), (No. 2017-0-00897, Development of Object Detection and Recognition for Intelligent Vehicles) and (No. 2018-0-01290, Development of an Open Dataset and Cognitive Processing Technology for the Recognition of Features Derived From Unstructured Human Motions Used in Self-driving Cars).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, I., Lee, H., Lee, J., Lee, E., Kim, D. (2021). Multi-task Learning with Future States for Vision-Based Autonomous Driving. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12624. Springer, Cham. https://doi.org/10.1007/978-3-030-69535-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-69535-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69534-7
Online ISBN: 978-3-030-69535-4
eBook Packages: Computer ScienceComputer Science (R0)