Multi-task Learning with Future States for Vision-Based Autonomous Driving

Kim, Inhan; Lee, Hyemin; Lee, Joonyeong; Lee, Eunseop; Kim, Dai**

doi:10.1007/978-3-030-69535-4_40

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12624))

Included in the following conference series:

Asian Conference on Computer Vision

776 Accesses

Abstract

Human drivers consider past and future driving environments to maintain stable control of a vehicle. To adopt a human driver’s behavior, we propose a vision-based autonomous driving model, called Future Actions and States Network (FASNet), which uses predicted future actions and generated future states in multi-task learning manner. Future states are generated using an enhanced deep predictive-coding network and motion equations defined by the kinematic vehicle model. The final control values are determined by the weighted average of the predicted actions for a stable decision. With these methods, the proposed FASNet has a high generalization ability in unseen environments. To validate the proposed FASNet, we conducted several experiments, including ablation studies in realistic three-dimensional simulations. FASNet achieves a higher Success Rate (SR) on the recent CARLA benchmarks under several conditions as compared to state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 85.59; Price includes VAT (France)

Softcover Book: EUR 105.49; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

Anticipating Autonomous Vehicle Driving based on Multi-Modal Multiple Motion Tasks Network

Article 15 July 2022

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

References

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the Kitti dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Article Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. ar**v preprint ar**v:1604.07316 (2016)
Hecker, S., Dai, D., Van Gool, L.: End-to-end learning of driving models with surround-view cameras and route planners. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 449–468. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_27
Chapter Google Scholar
Huang, Z., Zhang, J., Tian, R., Zhang, Y.: End-to-end autonomous driving decision based on deep reinforcement learning. In: 2019 5th International Conference on Control, Automation and Robotics (ICCAR), pp. 658–662. IEEE (2019)
Google Scholar
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE (2018)
Google Scholar
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9329–9338 (2019)
Google Scholar
Liang, X., Wang, T., Yang, L., **ng, E.: CIRL: controllable imitative reinforcement learning for vision-based self-driving. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 604–620. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_36
Chapter Google Scholar
Sauer, A., Savinov, N., Geiger, A.: Conditional affordance learning for driving in urban environments. ar**v preprint ar**v:1806.06498 (2018)
Wang, Q., Chen, L., Tian, B., Tian, W., Li, L., Cao, D.: End-to-end autonomous driving: An angle branched network approach. IEEE Trans. Veh. Technol.(2019)
Google Scholar
Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. ar**v preprint ar**v:1912.12294 (2019)
Li, Z., Motoyoshi, T., Sasaki, K., Ogata, T., Sugano, S.: Rethinking self-driving: Multi-task knowledge for better generalization and accident explanation ability. ar**v preprint ar**v:1809.11100 (2018)
Chowdhuri, S., Pankaj, T., Zipser, K.: Multinet: Multi-modal multi-task learning for autonomous driving. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1496–1504. IEEE (2019)
Google Scholar
Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. ar**v preprint ar**v:1605.08104 (2016)
Kong, J., Pfeiffer, M., Schildbach, G., Borrelli, F.: Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1094–1099. IEEE (2015)
Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. ar**v preprint ar**v:1707.08114 (2017)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2174–2182 (2017)
Google Scholar
Chi, L., Mu, Y.: Deep steering: Learning end-to-end driving model from spatial and temporal visual cues. ar**v preprint ar**v:1708.03798 (2017)
Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.: Learning situational driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11296–11305 (2020)
Google Scholar
Yu, A., Palefsky-Smith, R., Bedi, R.: Deep reinforcement learning for simulated autonomous vehicle control, pp. 1–7. Course Project Reports, Winter (2016)
Google Scholar
Toromanoff, M., Wirbel, E., Moutarde, F.: End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7153–7162 (2020)
Google Scholar
Tai, L., Yun, P., Chen, Y., Liu, C., Ye, H., Liu, M.: Visual-based autonomous driving deployment from a stochastic and uncertainty-aware perspective. ar**v preprint ar**v:1903.00821 (2019)
Yang, Z., Zhang, Y., Yu, J., Cai, J., Luo, J.: End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2289–2294. IEEE (2018)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. ar**v preprint ar**v:1511.05440 (2015)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Google Scholar
Liang, X., Lee, L., Dai, W., **ng, E.P.: Dual motion GAN for future-flow embedded video prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1744–1752 (2017)
Google Scholar
Wei, H., Yin, X., Lin, P.: Novel video prediction for large-scale scene using optical flow. ar**v preprint ar**v:1805.12243 (2018)
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2017)
Article Google Scholar
Du, L., Zhao, Z., Su, F., Wang, L., An, C.: Jointly predicting future sequence and steering angles for dynamic driving scenes. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4070–4074. IEEE (2019)
Google Scholar
**, X., et al.: Predicting scene parsing and motion dynamics in the future. In: Advances in Neural Information Processing Systems, pp. 6915–6924 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. (2016)
Google Scholar
Song, G., Chai, W.: Collaborative learning for deep neural networks. In: Advances in Neural Information Processing Systems, pp. 1832–1841 (2018)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. ar**v preprint ar**v:1711.03938 (2017)
felipecode: Carla 0.8.4 data collector (2018). https://github.com/carla-simulator/data-collector

Download references

Acknowledgments

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2014-0-00059, Development of Predictive Visual Intelligence Technology), (No. 2017-0-00897, Development of Object Detection and Recognition for Intelligent Vehicles) and (No. 2018-0-01290, Development of an Open Dataset and Cognitive Processing Technology for the Recognition of Features Derived From Unstructured Human Motions Used in Self-driving Cars).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, POSTECH, Pohang-si, Korea
Inhan Kim, Hyemin Lee, Joonyeong Lee, Eunseop Lee & Dai** Kim

Authors

Inhan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyemin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Joonyeong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Eunseop Lee
View author publications
You can also search for this author in PubMed Google Scholar
Dai** Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inhan Kim .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Bei**g, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 352 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, I., Lee, H., Lee, J., Lee, E., Kim, D. (2021). Multi-task Learning with Future States for Vision-Based Autonomous Driving. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12624. Springer, Cham. https://doi.org/10.1007/978-3-030-69535-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-69535-4_40
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69534-7
Online ISBN: 978-3-030-69535-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-task Learning with Future States for Vision-Based Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

Anticipating Autonomous Vehicle Driving based on Multi-Modal Multiple Motion Tasks Network

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 352 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-task Learning with Future States for Vision-Based Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

Anticipating Autonomous Vehicle Driving based on Multi-Modal Multiple Motion Tasks Network

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 352 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation