Reward Delay Attacks on Deep Reinforcement Learning

  • Conference paper
  • First Online:
Decision and Game Theory for Security (GameSec 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13727))

Included in the following conference series:

  • 467 Accesses

Abstract

Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker’s targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. ACM: American center for mobility (2022). https://www.acmwillowrun.org/

  2. Altschuler, J., Brunel, V.E., Malek, A.: Best arm identification for contaminated bandits. J. Mach. Learn. Res. 20(91), 1–39 (2019)

    MathSciNet  MATH  Google Scholar 

  3. Behzadan, V., Munir, A.: Vulnerability of deep reinforcement learning to policy induction attacks. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 262–275. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_19

    Chapter  Google Scholar 

  4. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  5. Blaß, T., Casini, D., Bozhko, S., Brandenburg, B.: A ROS 2 response-time analysis exploiting starvation freedom and execution-time variance. In: Proceedings of the 42nd IEEE Real-Time Systems Symposium (RTSS), pp. 41–53 (2021)

    Google Scholar 

  6. Blaß, T., Hamann, A., Lange, R., Ziegenbein, D., Brandenburg, B.: Automatic latency management for ROS 2: benefits, challenges, and open problems. In: Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 264–277 (2021)

    Google Scholar 

  7. Brockman, G., et al.: Openai gym. ar**v preprint ar**v:1606.01540 (2016)

  8. Casini, D., Blaß, T., Lütkebohle, I., Brandenburg, B.: Response-time analysis of ROS 2 processing chains under reservation-based scheduling. In: Proceedings of the 31st Euromicro Conference on Real-Time Systems (ECRTS), pp. 6:1–6:23 (2019)

    Google Scholar 

  9. Choi, H., **ang, Y., Kim, H.: PiCAS: new design of priority- driven chain-aware scheduling for ROS2. In: Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) (2021)

    Google Scholar 

  10. Chung, K., et al.: Smart malware that uses leaked control data of robotic applications: the case of Raven-II surgical robots. In: 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), pp. 337–351. USENIX Association, Chaoyang District, Bei**g, September 2019. https://www.usenix.org/conference/raid2019/presentation/chung

  11. DeMarinis, N., Tellex, S., Kemerlis, V.P., Konidaris, G., Fonseca, R.: Scanning the internet for ROS: a view of security in robotics research. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8514–8521. IEEE (2019)

    Google Scholar 

  12. Dieber, B., Kacianka, S., Rass, S., Schartner, P.: Application-level security for ros-based applications. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4477–4482. IEEE (2016)

    Google Scholar 

  13. Dong, Y., et al.: Mcity data collection for automated vehicles study. ar**v preprint ar**v:1912.06258 (2019)

  14. Garnett, N., Cohen, R., Pe’er, T., Lahav, R., Levi, D.: 3D-lanenet: end-to-end 3d multiple lane detection. In: IEEE/CVF International Conference on Computer Vision, pp. 2921–2930 (2019)

    Google Scholar 

  15. Huang, S., Papernot, N., Goodfellow, I., Duan, Y., Abbeel, P.: Adversarial attacks on neural network policies. ar**v preprint ar**v:1702.02284 (2017)

  16. Huang, Y., Zhu, Q.: Deceptive reinforcement learning under adversarial manipulations on cost signals. In: Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G. (eds.) GameSec 2019. LNCS, vol. 11836, pp. 217–237. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32430-8_14

    Chapter  Google Scholar 

  17. Jun, K.S., Li, L., Ma, Y., Zhu, J.: Adversarial attacks on stochastic bandits. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  18. Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. (2021)

    Google Scholar 

  19. Kocher, P., et al.: Spectre attacks: exploiting speculative execution. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 1–19. IEEE (2019)

    Google Scholar 

  20. Kos, J., Song, D.: Delving into adversarial attacks on deep policies. ar**v preprint ar**v:1705.06452 (2017)

  21. Li, A., Wang, J., Zhang, N.: Chronos: timing interference as a new attack vector on autonomous cyber-physical systems. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 2426–2428 (2021)

    Google Scholar 

  22. Lin, Y.C., Hong, Z.W., Liao, Y.H., Shih, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. ar**v preprint ar**v:1703.06748 (2017)

  23. Liu, F., Shroff, N.: Data poisoning attacks on stochastic bandits. In: International Conference on Machine Learning, pp. 4042–4050. PMLR (2019)

    Google Scholar 

  24. Liu, S., et al.: Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. 22(7), e18477 (2020)

    Google Scholar 

  25. Luo, M., Myers, A.C., Suh, G.E.: Stealthy tracking of autonomous vehicles with cache side channels. In: 29th USENIX Security Symposium (USENIX Security 20), pp. 859–876 (2020)

    Google Scholar 

  26. Ma, Y., Jun, K.-S., Li, L., Zhu, X.: Data poisoning attacks in contextual bandits. In: Bushnell, L., Poovendran, R., Başar, T. (eds.) GameSec 2018. LNCS, vol. 11199, pp. 186–204. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01554-1_11

    Chapter  Google Scholar 

  27. Mahfouzi, R., Aminifar, A., Samii, S., Payer, M., Eles, P., Peng, Z.: Butterfly attack: adversarial manipulation of temporal properties of cyber-physical systems. In: 2019 IEEE Real-Time Systems Symposium (RTSS), pp. 93–106. IEEE (2019)

    Google Scholar 

  28. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  29. STII: Illinois autonomous and connected track (2022). https://ict.illinois.edu/i-act

  30. Tang, Y., et al.: Response time analysis and priority assignment of processing chains on ROS2 executors. In: Proceedings of the 41st IEEE Real-Time Systems Symposium (RTSS) (2020)

    Google Scholar 

  31. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  32. Wang, Z., Ren, W., Qiu, Q.: LaneNet: real-time lane detection networks for autonomous driving. ar**v preprint ar**v:1807.01726 (2018)

  33. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)

    Google Scholar 

  34. Zhang, H., Parkes, D.C.: Value-based policy teaching with active indirect elicitation. In: AAAI, vol. 8, pp. 208–214 (2008)

    Google Scholar 

  35. Zhang, H., Parkes, D.C., Chen, Y.: Policy teaching through reward function learning. In: Proceedings of the 10th ACM Conference on Electronic Commerce, pp. 295–304 (2009)

    Google Scholar 

  36. Zhang, X., Ma, Y., Singla, A., Zhu, X.: Adaptive reward-poisoning attacks against reinforcement learning. In: International Conference on Machine Learning, pp. 11225–11234. PMLR (2020)

    Google Scholar 

Download references

Acknowledgments

This research was supported in part by the National Science Foundation (grants IIS-1905558, IIS-2214141, ECCS-2020289, CNS-2038995), Army Research Office (grant W911NF1910241), and NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anindya Sarkar .

Editor information

Editors and Affiliations

7 Appendix

7 Appendix

In this section, we present the end-to-end algorithmic approach of reward delay attack strategy and reward shifting attack strategy in Algorithm 1 and 2 respectively.

figure a
figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarkar, A., Feng, J., Vorobeychik, Y., Gill, C., Zhang, N. (2023). Reward Delay Attacks on Deep Reinforcement Learning. In: Fang, F., Xu, H., Hayel, Y. (eds) Decision and Game Theory for Security. GameSec 2022. Lecture Notes in Computer Science, vol 13727. Springer, Cham. https://doi.org/10.1007/978-3-031-26369-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26369-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26368-2

  • Online ISBN: 978-3-031-26369-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation