Cross-Architecture Lifter Synthesis

van Tonder, Rijnard; Le Goues, Claire

doi:10.1007/978-3-319-92970-5_10

Rijnard van Tonder¹⁵ &
Claire Le Goues¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10886))

Included in the following conference series:

International Conference on Software Engineering and Formal Methods

805 Accesses

Abstract

Code translation is a staple component of program analysis. A lifter is a code translation unit that translates low-level code to a higher-level intermediate representation (IR). Lifters thus enable a host of static and dynamic analyses for such low-level code. However, writing a lifter is a tedious manual process which must be repeated for every architecture an analysis aims to support. We introduce cross-architecture lifter synthesis, a novel approach that automatically synthesizes lifters for previously unsupported architectures. Our insight is that lifter synthesis can be bootstrapped with existing IR sketches that exploit the shared semantic properties of heterogeneous architecture instruction sets. We show that our approach automates a significant amount of translation effort for a previously unsupported instruction set, and that it enables discovery of new bugs on new architecture targets through reuse of an existing IR-based analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Thailand)

eBook: EUR 42.79; Price includes VAT (Thailand)

Softcover Book: EUR 49.99; Price excludes VAT (Thailand)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Black-Box Equivalence Checking Across Compiler Optimizations

Hardware/Software Compilation

A Study on the Impact of Compiler Optimizations on High-Level Synthesis

Notes

1.
e.g., https://opam.ocaml.org/packages/bap-warn-unused/bap-warn-unused.1.3.0.
2.
Available at https://github.com/BinaryAnalysisPlatform/bap.
3.
Note that this is not a limiting assumption on generalizing the technique: an existing, functional IR implies at least one existing translation layer implementation, as is the case with, e.g., REIL [8] LLVM [19], VEX IR [21].
4.
https://www.mips.com/products/architectures/mips32/.
5.
We use LLVM for convenience–dynamic binary instrumentation techniques can similarly provide instruction operands and opcodes.
6.
https://www.qemu.org/.
7.
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool.
8.
Note that IRs may lack a specified operational semantics. Our work emphasizes the importance of using a formally specified IR to enable translation synthesis.
9.
Note that PC-relative instructions, such as jumps, still need access to a program counter variable to enable synthesis. For this, an internal PC is kept in the execution environment.
10.
https://www.gnu.org/software/coreutils/coreutils.html.
11.
The missing rotation operator is however found in subexpressions of IR statements, but we fail to generate the desired statement \(\__{var} \texttt {:=} \__{var}<< \__{imm}\).
12.
Using Fig. 5, (excluding instructions TEQ, SYSCALL, SYNC, and SDC1 which are modeled differently in the trace than actual MIPS semantics), and compared to a simplified MIPS ISA (goo.gl/YUEdiy).

References

BAP IR Operational Semantics (2018). https://github.com/BinaryAnalysisPlatform/bil/releases/download/v0.1/bil.pdf. Accessed 23 Apr 2018
BAP Warn Unused Analysis (2018). https://opam.ocaml.org/packages/bap-warn-unused/bap-warn-unused.1.3.0/. Accessed 23 Apr 2018
Binary Analysis Platform (2018). https://github.com/BinaryAnalysisPlatform/bap. Accessed 23 Apr 2018
Alur, R., Bodík, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided Design, pp. 1–8 (2013)
Google Scholar
Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 executables. In: Compiler Construction, pp. 2732–2733 (2004)
Google Scholar
Bornholt, J., Torlak, E.: Scaling program synthesis by exploiting existing code. Machine Learning for Programming Languages (2015)
Google Scholar
Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL 2016, pp. 775–788 (2016)
Google Scholar
Dullien, T., Porst, S.: REIL: a platform-independent intermediate representation of disassembled code for static code analysis. In: CanSecWest 2009 (2009)
Google Scholar
Federico, A.D., Payer, M., Agosta, G.: rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. In: CC 2017, pp. 131–141 (2017)
Google Scholar
Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings from I/O samples. In: PLDI 2012, pp. 441–452 (2012)
Google Scholar
Gotovchits, I., van Tonder, R., Brumley, D.: Saluki: finding taint-style vulnerabilities with static property checking. In: BAR 2018 (2018)
Google Scholar
Hasabnis, N., Qiao, R., Sekar, R.: Checking correctness of code generator architecture specifications. In: CGO 2015, pp. 167–178 (2015)
Google Scholar
Hasabnis, N., Sekar, R.: Extracting instruction semantics via symbolic execution of code generators. In: FSE 2016, pp. 301–313 (2016)
Google Scholar
Hasabnis, N., Sekar, R.: Lifting assembly to intermediate representation: a novel approach leveraging compilers. In: ASPLOS 2016, pp. 311–324 (2016)
Google Scholar
Heule, S., Schkufza, E., Sharma, R., Aiken, A.: Stratified synthesis: automatically learning the x86–64 instruction set. In: PLDI 2016, pp. 237–250 (2016)
Google Scholar
Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.T.: On the naturalness of software. Commun. ACM 59(5), 122–131 (2016)
Article Google Scholar
Kim, S., Faerevaag, M., Junk, M., Jung, S., Oh, D., Lee, J., Cha, S.K.: Testing intermediate representations for binary analysis. In: ASE 2017 (2017)
Google Scholar
Kinder, J., Veith, H.: Precise static analysis of untrusted driver binaries. In: FMCAD 2010, pp. 43–50 (2010)
Google Scholar
Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, pp. 75–88 (2004)
Google Scholar
Le, V., Sun, C., Su, Z.: Randomized stress-testing of link-time optimizers. In: ISSTA 2015, pp. 327–337 (2015)
Google Scholar
Molnar, D., Li, X.C., Wagner, D.A.: Dynamic test generation to find integer bugs in x86 binary linux programs. In: USENIX Security Symposium 2009 (2009)
Google Scholar
Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: IEEE Security and Privacy, pp. 317–331 (2010)
Google Scholar
Sun, C., Le, V., Zhang, Q., Su, Z.: Toward understanding compiler bugs in GCC and LLVM. In: ISSTA 2016, pp. 294–305 (2016)
Google Scholar
Warren, H.S.: Hacker’s Delight. Pearson Education, London (2013)
Google Scholar

Download references

Acknowledgments

This work is partially supported under NSF grant number CCF-1563797. All statements are those of the authors, and do not necessarily reflect the views of the funding agency. The authors would like to thank the BAP Team for their continued open source development and support for the BAP project.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Rijnard van Tonder & Claire Le Goues

Authors

Rijnard van Tonder
View author publications
You can also search for this author in PubMed Google Scholar
Claire Le Goues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rijnard van Tonder or Claire Le Goues .

Editor information

Editors and Affiliations

University of Oslo, Oslo, Norway
Einar Broch Johnsen
Braunschweig University of Technology, Braunschweig, Germany
Ina Schaefer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Tonder, R., Le Goues, C. (2018). Cross-Architecture Lifter Synthesis. In: Johnsen, E., Schaefer, I. (eds) Software Engineering and Formal Methods. SEFM 2018. Lecture Notes in Computer Science(), vol 10886. Springer, Cham. https://doi.org/10.1007/978-3-319-92970-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-92970-5_10
Published: 30 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92969-9
Online ISBN: 978-3-319-92970-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross-Architecture Lifter Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Black-Box Equivalence Checking Across Compiler Optimizations

Hardware/Software Compilation

A Study on the Impact of Compiler Optimizations on High-Level Synthesis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cross-Architecture Lifter Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Black-Box Equivalence Checking Across Compiler Optimizations

Hardware/Software Compilation

A Study on the Impact of Compiler Optimizations on High-Level Synthesis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation