Cross-Architecture Lifter Synthesis

  • Conference paper
  • First Online:
Software Engineering and Formal Methods (SEFM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10886))

Included in the following conference series:

  • 805 Accesses

Abstract

Code translation is a staple component of program analysis. A lifter is a code translation unit that translates low-level code to a higher-level intermediate representation (IR). Lifters thus enable a host of static and dynamic analyses for such low-level code. However, writing a lifter is a tedious manual process which must be repeated for every architecture an analysis aims to support. We introduce cross-architecture lifter synthesis, a novel approach that automatically synthesizes lifters for previously unsupported architectures. Our insight is that lifter synthesis can be bootstrapped with existing IR sketches that exploit the shared semantic properties of heterogeneous architecture instruction sets. We show that our approach automates a significant amount of translation effort for a previously unsupported instruction set, and that it enables discovery of new bugs on new architecture targets through reuse of an existing IR-based analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Thailand)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Thailand)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 49.99
Price excludes VAT (Thailand)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    e.g., https://opam.ocaml.org/packages/bap-warn-unused/bap-warn-unused.1.3.0.

  2. 2.

    Available at https://github.com/BinaryAnalysisPlatform/bap.

  3. 3.

    Note that this is not a limiting assumption on generalizing the technique: an existing, functional IR implies at least one existing translation layer implementation, as is the case with, e.g., REIL [8] LLVM [19], VEX IR [21].

  4. 4.

    https://www.mips.com/products/architectures/mips32/.

  5. 5.

    We use LLVM for convenience–dynamic binary instrumentation techniques can similarly provide instruction operands and opcodes.

  6. 6.

    https://www.qemu.org/.

  7. 7.

    https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool.

  8. 8.

    Note that IRs may lack a specified operational semantics. Our work emphasizes the importance of using a formally specified IR to enable translation synthesis.

  9. 9.

    Note that PC-relative instructions, such as jumps, still need access to a program counter variable to enable synthesis. For this, an internal PC is kept in the execution environment.

  10. 10.

    https://www.gnu.org/software/coreutils/coreutils.html.

  11. 11.

    The missing rotation operator is however found in subexpressions of IR statements, but we fail to generate the desired statement \(\__{var} \texttt {:=} \__{var}<< \__{imm}\).

  12. 12.

    Using Fig. 5, (excluding instructions TEQ, SYSCALL, SYNC, and SDC1 which are modeled differently in the trace than actual MIPS semantics), and compared to a simplified MIPS ISA (goo.gl/YUEdiy).

References

  1. BAP IR Operational Semantics (2018). https://github.com/BinaryAnalysisPlatform/bil/releases/download/v0.1/bil.pdf. Accessed 23 Apr 2018

  2. BAP Warn Unused Analysis (2018). https://opam.ocaml.org/packages/bap-warn-unused/bap-warn-unused.1.3.0/. Accessed 23 Apr 2018

  3. Binary Analysis Platform (2018). https://github.com/BinaryAnalysisPlatform/bap. Accessed 23 Apr 2018

  4. Alur, R., Bodík, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided Design, pp. 1–8 (2013)

    Google Scholar 

  5. Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 executables. In: Compiler Construction, pp. 2732–2733 (2004)

    Google Scholar 

  6. Bornholt, J., Torlak, E.: Scaling program synthesis by exploiting existing code. Machine Learning for Programming Languages (2015)

    Google Scholar 

  7. Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL 2016, pp. 775–788 (2016)

    Google Scholar 

  8. Dullien, T., Porst, S.: REIL: a platform-independent intermediate representation of disassembled code for static code analysis. In: CanSecWest 2009 (2009)

    Google Scholar 

  9. Federico, A.D., Payer, M., Agosta, G.: rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. In: CC 2017, pp. 131–141 (2017)

    Google Scholar 

  10. Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings from I/O samples. In: PLDI 2012, pp. 441–452 (2012)

    Google Scholar 

  11. Gotovchits, I., van Tonder, R., Brumley, D.: Saluki: finding taint-style vulnerabilities with static property checking. In: BAR 2018 (2018)

    Google Scholar 

  12. Hasabnis, N., Qiao, R., Sekar, R.: Checking correctness of code generator architecture specifications. In: CGO 2015, pp. 167–178 (2015)

    Google Scholar 

  13. Hasabnis, N., Sekar, R.: Extracting instruction semantics via symbolic execution of code generators. In: FSE 2016, pp. 301–313 (2016)

    Google Scholar 

  14. Hasabnis, N., Sekar, R.: Lifting assembly to intermediate representation: a novel approach leveraging compilers. In: ASPLOS 2016, pp. 311–324 (2016)

    Google Scholar 

  15. Heule, S., Schkufza, E., Sharma, R., Aiken, A.: Stratified synthesis: automatically learning the x86–64 instruction set. In: PLDI 2016, pp. 237–250 (2016)

    Google Scholar 

  16. Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.T.: On the naturalness of software. Commun. ACM 59(5), 122–131 (2016)

    Article  Google Scholar 

  17. Kim, S., Faerevaag, M., Junk, M., Jung, S., Oh, D., Lee, J., Cha, S.K.: Testing intermediate representations for binary analysis. In: ASE 2017 (2017)

    Google Scholar 

  18. Kinder, J., Veith, H.: Precise static analysis of untrusted driver binaries. In: FMCAD 2010, pp. 43–50 (2010)

    Google Scholar 

  19. Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, pp. 75–88 (2004)

    Google Scholar 

  20. Le, V., Sun, C., Su, Z.: Randomized stress-testing of link-time optimizers. In: ISSTA 2015, pp. 327–337 (2015)

    Google Scholar 

  21. Molnar, D., Li, X.C., Wagner, D.A.: Dynamic test generation to find integer bugs in x86 binary linux programs. In: USENIX Security Symposium 2009 (2009)

    Google Scholar 

  22. Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: IEEE Security and Privacy, pp. 317–331 (2010)

    Google Scholar 

  23. Sun, C., Le, V., Zhang, Q., Su, Z.: Toward understanding compiler bugs in GCC and LLVM. In: ISSTA 2016, pp. 294–305 (2016)

    Google Scholar 

  24. Warren, H.S.: Hacker’s Delight. Pearson Education, London (2013)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported under NSF grant number CCF-1563797. All statements are those of the authors, and do not necessarily reflect the views of the funding agency. The authors would like to thank the BAP Team for their continued open source development and support for the BAP project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rijnard van Tonder or Claire Le Goues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van Tonder, R., Le Goues, C. (2018). Cross-Architecture Lifter Synthesis. In: Johnsen, E., Schaefer, I. (eds) Software Engineering and Formal Methods. SEFM 2018. Lecture Notes in Computer Science(), vol 10886. Springer, Cham. https://doi.org/10.1007/978-3-319-92970-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92970-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92969-9

  • Online ISBN: 978-3-319-92970-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation