Abstract
Code translation is a staple component of program analysis. A lifter is a code translation unit that translates low-level code to a higher-level intermediate representation (IR). Lifters thus enable a host of static and dynamic analyses for such low-level code. However, writing a lifter is a tedious manual process which must be repeated for every architecture an analysis aims to support. We introduce cross-architecture lifter synthesis, a novel approach that automatically synthesizes lifters for previously unsupported architectures. Our insight is that lifter synthesis can be bootstrapped with existing IR sketches that exploit the shared semantic properties of heterogeneous architecture instruction sets. We show that our approach automates a significant amount of translation effort for a previously unsupported instruction set, and that it enables discovery of new bugs on new architecture targets through reuse of an existing IR-based analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Available at https://github.com/BinaryAnalysisPlatform/bap.
- 3.
- 4.
- 5.
We use LLVM for convenience–dynamic binary instrumentation techniques can similarly provide instruction operands and opcodes.
- 6.
- 7.
- 8.
Note that IRs may lack a specified operational semantics. Our work emphasizes the importance of using a formally specified IR to enable translation synthesis.
- 9.
Note that PC-relative instructions, such as jumps, still need access to a program counter variable to enable synthesis. For this, an internal PC is kept in the execution environment.
- 10.
- 11.
The missing rotation operator is however found in subexpressions of IR statements, but we fail to generate the desired statement \(\__{var} \texttt {:=} \__{var}<< \__{imm}\).
- 12.
Using Fig. 5, (excluding instructions TEQ, SYSCALL, SYNC, and SDC1 which are modeled differently in the trace than actual MIPS semantics), and compared to a simplified MIPS ISA (goo.gl/YUEdiy).
References
BAP IR Operational Semantics (2018). https://github.com/BinaryAnalysisPlatform/bil/releases/download/v0.1/bil.pdf. Accessed 23 Apr 2018
BAP Warn Unused Analysis (2018). https://opam.ocaml.org/packages/bap-warn-unused/bap-warn-unused.1.3.0/. Accessed 23 Apr 2018
Binary Analysis Platform (2018). https://github.com/BinaryAnalysisPlatform/bap. Accessed 23 Apr 2018
Alur, R., BodÃk, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided Design, pp. 1–8 (2013)
Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 executables. In: Compiler Construction, pp. 2732–2733 (2004)
Bornholt, J., Torlak, E.: Scaling program synthesis by exploiting existing code. Machine Learning for Programming Languages (2015)
Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL 2016, pp. 775–788 (2016)
Dullien, T., Porst, S.: REIL: a platform-independent intermediate representation of disassembled code for static code analysis. In: CanSecWest 2009 (2009)
Federico, A.D., Payer, M., Agosta, G.: rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. In: CC 2017, pp. 131–141 (2017)
Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings from I/O samples. In: PLDI 2012, pp. 441–452 (2012)
Gotovchits, I., van Tonder, R., Brumley, D.: Saluki: finding taint-style vulnerabilities with static property checking. In: BAR 2018 (2018)
Hasabnis, N., Qiao, R., Sekar, R.: Checking correctness of code generator architecture specifications. In: CGO 2015, pp. 167–178 (2015)
Hasabnis, N., Sekar, R.: Extracting instruction semantics via symbolic execution of code generators. In: FSE 2016, pp. 301–313 (2016)
Hasabnis, N., Sekar, R.: Lifting assembly to intermediate representation: a novel approach leveraging compilers. In: ASPLOS 2016, pp. 311–324 (2016)
Heule, S., Schkufza, E., Sharma, R., Aiken, A.: Stratified synthesis: automatically learning the x86–64 instruction set. In: PLDI 2016, pp. 237–250 (2016)
Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.T.: On the naturalness of software. Commun. ACM 59(5), 122–131 (2016)
Kim, S., Faerevaag, M., Junk, M., Jung, S., Oh, D., Lee, J., Cha, S.K.: Testing intermediate representations for binary analysis. In: ASE 2017 (2017)
Kinder, J., Veith, H.: Precise static analysis of untrusted driver binaries. In: FMCAD 2010, pp. 43–50 (2010)
Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, pp. 75–88 (2004)
Le, V., Sun, C., Su, Z.: Randomized stress-testing of link-time optimizers. In: ISSTA 2015, pp. 327–337 (2015)
Molnar, D., Li, X.C., Wagner, D.A.: Dynamic test generation to find integer bugs in x86 binary linux programs. In: USENIX Security Symposium 2009 (2009)
Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: IEEE Security and Privacy, pp. 317–331 (2010)
Sun, C., Le, V., Zhang, Q., Su, Z.: Toward understanding compiler bugs in GCC and LLVM. In: ISSTA 2016, pp. 294–305 (2016)
Warren, H.S.: Hacker’s Delight. Pearson Education, London (2013)
Acknowledgments
This work is partially supported under NSF grant number CCF-1563797. All statements are those of the authors, and do not necessarily reflect the views of the funding agency. The authors would like to thank the BAP Team for their continued open source development and support for the BAP project.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
van Tonder, R., Le Goues, C. (2018). Cross-Architecture Lifter Synthesis. In: Johnsen, E., Schaefer, I. (eds) Software Engineering and Formal Methods. SEFM 2018. Lecture Notes in Computer Science(), vol 10886. Springer, Cham. https://doi.org/10.1007/978-3-319-92970-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-92970-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92969-9
Online ISBN: 978-3-319-92970-5
eBook Packages: Computer ScienceComputer Science (R0)