Abstract
Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar. To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for a fully disambiguated context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi’s ‘resugaring’, simple refactoring, and language evolution.
Similar content being viewed by others
Notes
We assume basic knowledge about functional programming languages and their notations, in particular Haskell [5, 32]. In Haskell, an argument of function application does not need to be enclosed in (round) parentheses, i.e. we write \(f\,x\) instead of f(x); type variables are implicitly universally quantified, i.e. \(f \;{:}{:}\; a \rightarrow b \rightarrow a\) is the same as \(f \;{:}{:}\; \forall a\ b.\ a \rightarrow b \rightarrow a\), where : : means has type. Additionally, we omit universal quantification for free variables in an equation; for instance, \(\textit{parse}\;(\textit{print}\;s\;t) = t\) is in fact \(\forall s\ t.\ \textit{parse}\;(\textit{print}\;s\;t) = t\).
While single quotation marks are for characters, double quotation marks are for strings. For simplicity, the user can always use double quotation marks.
The reason for storing primitives in the \(\mathsf {String}\) type is because \(\mathsf {String}\) is the most precise representation that will not cause the loss of any information. For instance, this is useful for retaining the leading zeros of an integer such as 073. Storing 073 as \(\mathsf {Integer}\) will cause the loss of the leading zero.
For simplicity, we use \(^\sharp \) to annotate type-incorrect CSTs in which fields for layouts (and comments) and unimportant constructors such as Lit are omitted.
The general type for disambiguation filters is \([t] \rightarrow [t]\), which allows comparison among a list of CSTs. However, since in this paper we only consider property filters defined in terms of predicates (on a single tree), it is sufficient to use the simplified type \(t \rightarrow \mathsf {Bool}\). See Generalised Parsing, Disambiguation, and Filters.
This is not a very realistic filter, although it sufficiently demonstrates the use of filters and removes ambiguity in simplest cases like 1 + 2 * 3. In general, the filter should be complete (Definition 9) so that ambiguity is fully removed from the grammar.
Although terminals such as ‘*’ and ‘+’ are uniquely determined by constructors and not explicitly included in the CSTs, there are fields in CSTs for holding whitespaces after them. Thus, Times still has three subtrees. Also, for simplicity, the bi-filter fTimesPlusPrio attempts to repair the whitespace subtree t2 even though the repair can never happen since t2 cannot match p.
Although they use different implementation techniques, we will not dive into them in our related work. See Matsuda and Wang’s related work for a comparison [34].
An injective production, or a chain production, is one whose right-hand side is a single nonterminal; for instance, \(\texttt {E -> N}\).
References
Aasa, A.: Precedences in specifications and implementations of programming languages. In: Selected Papers of the Symposium on Programming Language Implementation and Logic Programming, Elsevier Science Publishers B. V., Amsterdam, PLILP ’91, pp. 3–26. http://dl.acm.org/citation.cfm?id=203429.203431 (1995)
Afroozeh, A., Izmaylova, A.: Faster, practical GLL parsing. In: Franke, B. (ed.) Compiler Construction, pp. 89–108. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-46663-6_5
Aho, A.V., Johnson, S.C., Ullman, J.D.: Deterministic parsing of ambiguous grammars. Commun. ACM 18(8), 441–452 (1975)
Appel, A.W.: Modern Compiler Implementation in ML, 1st edn. Cambridge University Press, New York (1998)
Bird, R.: Thinking Functionally with Haskell. Cambridge University Press, Cambridge (2014). https://doi.org/10.1017/CBO9781316092415
Boulton, R.: Syn: a single language for specifying abstract syntax trees, lexical analysis, parsing and pretty-printing. Tech. Rep. Number 390, Computer Laboratory, University of Cambridge (1966)
Brabrand, C., Møller, A., Schwartzbach, M.I.: Dual syntax for XML languages. Inf. Syst. 33(4–5), 385–406 (2008). https://doi.org/10.1016/j.is.2008.01.006
van den Brand, M., Visser, E.: Generation of formatters for context-free languages. ACM Trans. Softw. Eng. Methodol. 5(1), 1–41 (1996). https://doi.org/10.1145/226155.226156
van den Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation filters for scannerless generalized LR parsers. In: Proceedings of the 11th International Conference on Compiler Construction, Springer, London, UK, CC ’02, pp. 143–158. https://doi.org/10.1007/3-540-45937-5_12 (2002)
Cantor, D.G.: On the ambiguity problem of Backus systems. J. ACM 9(4), 477–479 (1962)
Czarnecki, K., Foster, J.N., Hu, Z., Lämmel, R., Schürr, A., Terwilliger, J.F.: Bidirectional transformations: a cross-discipline perspective. In: Proceedings of the 2nd International Conference on Theory and Practice of Model Transformations, Springer, Berlin, ICMT ’09, pp. 260–283. https://doi.org/10.1007/978-3-642-02408-5_19 (2009)
Dijkstra, E.W.: Guarded commands, nondeterminacy and formal derivation of programs. Commun. ACM 18(8), 453–457 (1975). https://doi.org/10.1145/360933.360975
Duregård, J., Jansson, P.: Embedded parser generators. In: Proceedings of the 4th ACM Symposium on Haskell, ACM, New York, NY, USA, Haskell ’11, pp. 107–117. https://doi.org/10.1145/2034675.2034689 (2011)
Earley, J.: An efficient context-free parsing algorithm. Commun. ACM 13(2), 94–102 (1970). https://doi.org/10.1145/362007.362035
Fischer, S., Hu, Z., Pacheco, H.: The essence of bidirectional programming. Sci. China Inf. Sci. 58(5), 1–21 (2015)
Foster, J.N.: Bidirectional programming languages. PhD thesis, University of Pennsylvania (2009)
Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Combinators for bidirectional tree transformations: a linguistic approach to the view-update problem. ACM Trans. Program. Lang. Syst. 29, 3 (2007). https://doi.org/10.1145/1232420.1232424
Fowler, M., Beck, K.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, Boston (1999)
Gibbons, J., Stevens, P.: International Summer School on Bidirectional Transformations (Oxford, UK, 25–29 July 2016). Lecture Notes in Computer Science, vol. 9715. Springer, Berlin (2018)
Gosling, J., Joy, B., Steele, G.: The Java Language Specification, 3rd ed (2006). https://docs.oracle.com/javase/specs/
Hirzel, M., Rose, K.H.: Tiger language specification (2013). https://cs.nyu.edu/courses/fall13/CSCI-GA.2130-001/tiger-spec.pdf
Hu, Z., Ko, H.S.: Principles and practice of bidirectional programming in BiGUL. In: Gibbons, J., Stevens, P. (eds.) Bidirectional Transformations: International Summer School, Oxford, UK, July 25–29, 2016, Tutorial Lectures, pp. 100–150. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-79108-1_4
Johnson, S.C.: Yacc: Yet another compiler-compiler. AT&T Bell Laboratories Technical Reports (AT&T Bell Laboratories Murray Hill, New Jersey 07974). p. 32 (1975)
Kernighan, B.W., Ritchie, D.M.: The C Programming Language. Prentice Hall Press, Upper Saddle River (1988)
Kinoshita, D., Nakano, K.: Bidirectional certified programming. In: Eramo, R., Johnson. M. (eds) Proceedings of the 6th International Workshop on Bidirectional Transformations Co-Located with The European Joint Conferences on Theory and Practice of Software (ETAPS 2017), CEUR Workshop Proceedings, Uppsala, Sweden, vol. 1827, pp. 31–38 (2017)
Klint, P., Visser, E.: Using filters for the disambiguation of context-free grammars. In: Pighizzini, G., Pietro, P.S. (eds) Proceedings of the ASMICS Workshop on Parsing Theory, University of Milan, Italy, Milano, Italy, pp. 1–20 (1994)
Ko, H.S., Hu, Z.: An axiomatic basis for bidirectional programming. Proc. ACM Program. Lang. 2(POPL), 41:1–41:29 (2018). https://doi.org/10.1145/3158129
Ko, H.S., Zan, T., Hu, Z.: BiGUL: a formally verified core language for putback-based bidirectional programming. In: Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, ACM, New York, NY, USA, PEPM ’16, pp. 61–72 (2016). https://doi.org/10.1145/2847538.2847544
LaLonde, W.R., des Rivieres, J.: Handling operator precedence in arithmetic expressions with tree transformations. ACM Trans. Program. Lang. Syst. 3(1), 83–103 (1981). https://doi.org/10.1145/357121.357127
Lämmel, R., Jones, S.P.: Scrap your boilerplate: a practical design pattern for generic programming. In: Proceedings of the 2003 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, ACM, New York, NY, USA, TLDI ’03, pp. 26–37 (2003). https://doi.org/10.1145/604174.604179
Lutterkort, D.: Augeas—a configuration API. In: Proceedings of the Ottawa Linux Symposium, Ottawa, Canada, pp. 47–56 (2008)
Macedo, N., Pacheco, H., Cunha, A., Oliveira, J.N.: Composing least-change lenses. Proc. Sec. Int. Workshop Bidirect. Transform. 57, 1–19 (2013). https://doi.org/10.14279/tuj.eceasst.57.868
Marlow, S., Gill, A.: The parser generator for Haskell. https://www.haskell.org/happy/ (2001)
Marlow, S., et al.: Haskell 2010 language report. https://www.haskell.org/onlinereport/haskell2010/ (2010)
Martins, P., Saraiva, J., Fernandes, J.P., Van Wyk, E.: Generating attribute grammar-based bidirectional transformations from rewrite rules. In: Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program Manipulation, ACM, New York, NY, USA, PEPM ’14, pp. 63–70 (2014). https://doi.org/10.1145/2543728.2543745
Matsuda, K., Wang, M.: Embedding invertible languages with binders: a case of the FliPpr language. In: Proceedings of the 11th ACM SIGPLAN International Symposium on Haskell, ACM, New York, NY, USA, Haskell 2018, pp. 158–171 (2018a). https://doi.org/10.1145/3242744.3242758
Matsuda, K., Wang, M.: FliPpr: a system for deriving parsers from pretty-printers. New Gener. Comput. 36(3), 173–202 (2018b). https://doi.org/10.1007/s00354-018-0033-7
Matsuda, K., Mu, S.C., Hu, Z., Takeichi, M.: A grammar-based approach to invertible programs. In: Gordon, A.D. (ed) Proceedings of the 19th European Conference on Programming Languages and Systems, Springer, Berlin, no. 20 in ESOP’10, pp. 448–467 (2010). https://doi.org/10.1007/978-3-642-11957-6_24
Norell, U.: Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology (2007)
Pacheco, H., Hu, Z., Fischer, S.: Monadic combinators for “putback” style bidirectional programming. In: Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program Manipulation, ACM, New York, NY, USA, PEPM ’14, pp. 39–50 (2014a). https://doi.org/10.1145/2543728.2543737
Pacheco, H., Zan, T., Hu, Z.: BiFluX: A bidirectional functional update language for XML. In: Proceedings of the 16th International Symposium on Principles and Practice of Declarative Programming, ACM, New York, NY, USA, PPDP ’14, pp. 147–158 (2014b). https://doi.org/10.1145/2643135.2643141
Pombrio, J., Krishnamurthi, S.: Resugaring: lifting evaluation sequences through syntactic sugar. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM, New York, NY, USA, no. 6 in PLDI ’14, pp. 361–371 (2014). https://doi.org/10.1145/2594291.2594319
Pombrio, J., Krishnamurthi, S.: Hygienic resugaring of compositional desugaring. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ACM, New York, NY, USA, no. 13 in ICFP 2015, pp. 75–87 (2015). https://doi.org/10.1145/2784731.2784755
Rendel, T., Ostermann, K.: Invertible syntax descriptions: unifying parsing and pretty printing. In: Proceedings of the Third ACM Haskell Symposium on Haskell, ACM, New York, NY, USA, Haskell ’10, pp. 1–12 (2010). https://doi.org/10.1145/1863523.1863525
Reps, T., Teitelbaum, T.: The synthesizer generator. In: Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, ACM, New York, NY, USA, SDE 1, pp. 42–48 (1984). https://doi.org/10.1145/800020.808247
Reps, T., Teitelbaum, T., Demers, A.: Incremental context-dependent analysis for language-based editors. ACM Trans. Program. Lang. Syst. 5(3), 449–477 (1983). https://doi.org/10.1145/2166.357218
Scott, E., Johnstone, A.: GLL parsing. Electron. Notes Theor. Comput. Sci. 253(7), 177–189 (2010). https://doi.org/10.1016/j.entcs.2010.08.041
Scott, E., Johnstone, A., Economopoulos, R.: BRNGLR: a cubic tomita-style glr parsing algorithm. Acta Inform. 44(6), 427–461 (2007). https://doi.org/10.1007/s00236-007-0054-z
Sheard, T., Jones, S.P.: Template meta-programming for Haskell. In: Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell, ACM, New York, NY, USA, Haskell ’02, pp. 1–16 (2002). https://doi.org/10.1145/581690.581691
Tomita, M.: An efficient context-free parsing algorithm for natural languages. In: Proceedings of the 9th International Joint Conference on Artificial Intelligence-Volume 2, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’85, pp. 756–764 (1985). http://dl.acm.org/citation.cfm?id=1623611.1623625
Traver, V.J.: On compiler error messages: what they say and what they mean. Ad. Hum. Comput. Interact. 2010, 3:1–3:26 (2010). https://doi.org/10.1155/2010/602570
Visser, E.: A case study in optimizing parsing schemata by disambiguation filters. International Workshop on Parsing Technology (IWPT 1997), pp. 210–224. Massachusetts Institute of Technology, Boston, USA (1997a)
Visser, E.: Syntax definition for language prototy**. PhD thesis, University of Amsterdam (1997b)
Younger, D.H.: Recognition and parsing of context-free languages in time \(n^3\). Inf. Control 10(2), 189–208 (1967)
Zhu, Z., Zhang, Y., Ko, H.S., Martins, P., Saraiva, J., Hu, Z.: Parsing and reflective printing, bidirectionally. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering, ACM, New York, NY, USA, SLE 2016, pp. 2–14. https://doi.org/10.1145/2997364.2997369 (2016)
Acknowledgements
We thank the reviewers and the editor for their selflessness and effort spent on reviewing our paper, a quite long one. With their help, the readability of the paper is much improved, especially regarding how several case studies are structured, how theorems for the basic BiYacc and theorems for the extended version handling ambiguous grammars are related, and how look-alike notions are ‘disambiguated’. This work is partially supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (S) No. 17H06099; in particular, most of the second author’s contributions were made when he worked at the National Institute of Informatics and funded by the Grant.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Zhu, Z., Ko, HS., Zhang, Y. et al. Unifying Parsing and Reflective Printing for Fully Disambiguated Grammars. New Gener. Comput. 38, 423–476 (2020). https://doi.org/10.1007/s00354-019-00082-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-019-00082-y