1 Introduction

Seeman’s ground-breaking work in DNA self-assembly that initiated the field of DNA nanotechnology has had surprisingly broad impacts on many other fields. In particular it has impacted the field of nanotechnology more widely [1,17,18,19] including computer science and mathematics [20,21,22,23,24,25,26,27,28]. Advances in the sciences often drive the creation of completely new mathematical fields. The field of DNA nanotechnology is a prime example of this phenomenon. It has been steadily spawning a slew of new mathematical problems. These problems are now giving birth to a new field of mathematics that might be called ‘DNA-mathematics’.

We live in a world dominated by computers, mobile devices, advances in technology, and the internet. We have foodwebs, social networks, and contact tracing in epidemiological models. The mathematics of interconnections, in particular network and graph theory, became essential for better understanding of many modern life situations. For example, graph drawing tools led to effective computer chip layouts while random graphs are often used in modeling the worldwide web. We now see this same phenomenon of new mathematics emerging from questions about shapes and interconnections driven by nucleic acids structures in biology and DNA nanotechnology.

In his pioneering article on DNA self-assembly Ned Seeman proposed using DNA molecules, called branched junction molecules, to build complex nanostructures [29]. These molecules are shaped like starfish with anywhere from three to twelve arms and can attach to one another via sequences of complementary DNA bases at the ends of their arms (sticky ends), as though the starfish are holding hands. Thus, for example, four three-armed branched junction molecules can join together to form the outline of a tetrahedron.

In 2006, a major advance in DNA self-assembly was obtained by Rothemund’s introduction of DNA origami [30], a method where a single-stranded DNA plasmid outlines a pre-designed shape and about 200–250 short strands of DNA complementary to different locations of the plasmid are used to assemble and secure the structure. Because of the potential applications of self-assembling DNA nanotechnology, especially in medicine, and also in nanoscale robotics, circuitry, and biosensors, hundreds of laboratories around the world today focus on it.

Many challenges arise in designing DNA molecules that are to self-assemble into a desired shape. While some of the challenges involve chemical processes, many others involve structural questions such as which arms of the branched junction molecules should attach to which other arms, or how to route the scaffolding strand and staples through the desired structure, so that the smaller molecules then self-assemble into exactly the desired larger shape. As the experimental advances evolve rapidly, mathematical foundations to address new and upcoming questions that arise from laboratory experiments are becoming more essential.

DNA self-assembly now involves novel mathematical approaches and tools that inform the design of the structure both to assemble the nanocomplex as well as to ease the analysis of the experimental results. It becomes particularly exciting, from a mathematical perspective, when these approaches diverge from the original stimulus to problems of intrinsic mathematical interest independent of the initial application, thus significantly expanding the scope of the mathematical investigations.

Fortunately, problems involving shapes and interconnections are at the heart of mathematics and mathematicians have become natural collaborators to DNA self-assembly researchers. New mathematical formalism is being developed to solve mathematical problems arising from DNA self-assembly. Many of the target molecular shapes have wireframe structures, such as the outlines of a cube [31] or octahedron [32], or 2D and 3D lattices [11, 14], or even triangular mesh bunny rabbits and lacy snowflakes [33]. Since the outlines of these wireframe structures correspond to the edges of a graph and the corners of the shape to the vertices of a graph, graph theory (in particular topological graph theory) and knot theory have emerged as an excellent platform to study assembly problems. However, existing graph and knot theory lack sufficient descriptive properties to capture the essence of DNA self-assembly and cannot always address the new problems arising from the self-assembly application. Thus, a new subfield in mathematics is born, DNA mathematics, which develops new mathematical tools for, and arises from, bottom-up assembly.

As shared in the sections below, the mathematical problems driven by self-assembly processes are rich and prolific. They lead discrete mathematics and topology beyond current mainstream trends in these fields and hence break open new directions such as edge-outer embeddability, origami knotting, and new algebraic languages to describe structures. These theoretical directions are open-ended, generally scale independent, and will lay a broad foundation for future growth applicable to self-assembly in many settings, from nano to macro.

2 Flexible Tiles and New Graph Invariants

The appearances of the first three-dimensional DNA structures such as the cube [31], and the truncated octahedron [32], arrived at the same time as the idea of using nucleic acids and biomolecules for computations when molecular-based information processing was initiated with Adleman’s seminal paper [34]. If computation is to be performed with molecular structures, assemblies of arbitrary 3D wireframe or graph-like molecules without inherent symmetry may be necessary. The first such construct of a graph with six vertices using branched junction molecules representing vertices, and regular duplex molecules representing edges, was reported in [35, 36]. These molecules contained non-paired nucleotides throughout the duplexes making the arms of the molecules flexible such that, in the process of assembly, their sticky ends could easily join their respective complements. After ligating the nicks (breaks in the strands), the resulting (graph) structure consisted of a single cyclic DNA strand, which could conform in at least two knotted topologies  [36]. These experimental results initiated a mathematical model that captures some of the design challenges of the flexible armed tiles used in construction of spatially embedded graph structures.

Fig. 1
2 illustrations. 1 represents a molecule of D N A with 3 branches. 2 represents a 3 valent vertex graph with 3 arrows on 3 sides.

Three-armed branched DNA molecule seen as three-valent vertex in a graph with three half-edges. The single-stranded extensions of the arms encode the bond types

A combinatorial abstraction that consists of a vertex with half-edges labeled by the sticky-end types on the arms of the branched junction molecule is called a tile (Fig. 1). It can be denoted as a multi-set of bond types indicating the types of sticky-end types flanking the ends of the arms. The complementary sticky-end types are denoted by marking bond types with two complementary versions, indicated with a symbol from an alphabet (e.g., \(a,b,c,\ldots \)) and a hatted version of the symbol (e.g., \(\hat{a}, \hat{b}, \hat{c}, \ldots \)). Multiple entries of the same bond type are indicated by the exponent to the corresponding symbol. For example, Fig. 2a shows tiles \(t_1=\{\hat{a},b,\hat{c}\}\), \(t_2=\{a^3\}\), \(t_3=\{\hat{a},\hat{b},\hat{c}\}\), and \(t_4=\{\hat{a},c^2\}\).

Fig. 2
5 diagrams. A represents 4 tiles titled t subscript 1, t subscript 2, t subscript 3, and t subscript 4. Each tile consists of a node that is connected with 3 shaded arrowheads. Some arrowheads are reversed. B represents a tetrahedron with 4 tiles and 6 shaded arrowheads.

a Four tiles with their corresponding bonding types in colored arrowheads appearing at the end of the half-edges. The complementary bond types are indicated with reversed arrowheads. b A tetrahedron realized by a pot containing the tiles in (a)

A collection of tile types forms a pot, and a target graph G (or other 3D wireframe structure) is realized by a pot if it can be obtained by matching the vertices of G with tile types and identifying two half-edges of tiles having complementary versions of the same bond type with an edge of the graph. In the most basic setting G is an abstract graph, but other models also consider the geometry of the target graph. Each pair of half-edges forming an edge is subject to the restriction that a symbol is always paired with its hatted version and vice versa. Abstractly, a graph is considered assembled from tiles, if every vertex of the graph corresponds to a vertex of a tile and each edge corresponds to a pair of half edges with complementary sticky-ends joined together to form a bond edge. This is equivalent to finding an edge-labeled orientation of the graph, with the arrows pointing from the unhatted to the hatted half edges making up the labeled edge. In theory, any covering of a graph with a cycle realized by a pot can also be realized by the pot, but entropy disfavors these larger constructs. Further description of the model can be found in [28, 37,38,39].

There are two aspects to consider in such a‘pot’ and ‘tile’ set-up. One aspect considers properties of the pot: what types of graphs can be realized by a pot, how many isomorphism classes are there, are all structures that are realized complete (i.e., there are no unbonded half-edges), or, can they be always completed, etc. The other aspect asks questions about the graphs and the structures: what is the most ‘efficient’ pot (in terms of minimal number of tile types needed and bond types needed) that can realize a given graph, how are the properties of the pot related to other invariants of the graph in question, etc.

The properties of pots are most suitably studied with methods from linear algebra by associating a matrix to each pot whose ij-entries are ratios of bond type i for a tile type j present in a pot. Using this matrix, one can determine the stoichiometry of the self-assembly to ensure certain types of pot properties [39] (see also [37]). Although useful, this linear algebra approach does not capture the difficult task of better understanding of the process of assembly. As larger complexes form in a test tube, the thermodynamic properties can change and further assembly of larger structures may depend of the entropic conditions in the test tube, hence an expansion of the model to include thermodynamic properties may be necessary. One can also associate computational complexity classes with types of pots, where the number of tiles used in a structure that computes a problem is a base for the associated classes [23]. It was shown that 3-colorability of a graph, or other NP-hardFootnote 1 problems could be solved in \(\mathcal O(1)\) bio-steps [23, 36] by physically assembling the structure that emulates the solution of the problem. This was also shown experimentally by Seeman’s lab [7]. However, it seems we are yet to develop a good model that can encapsulate information processing by 3D structures, or a model that can tell us about computations by shapes.

Determining an ‘efficient’ pot for a target graph G means specifying a pot of tiles that realizes combinatorially, with a minimum number of tile types, and using a minimum number of bond-edge types. Recall that in the assembly process there are typically \(\sim {10}^{15}\) tiles of each type present in a tube, each tile can be used within a structure multiple times (or often taken, an arbitrarily large number of times). For the construction of G one can require an edge labeled orientation of the graph using a minimal alphabet for the labels. This becomes a new graph invariant B(G), the minimal bond alphabet. In order to prevent the self-aggregation of tiles during initial synthesis, a natural experimental requirement can be added so that no two half-edges of the same tile have complementary labels. This corresponds to prohibiting loops in the target graph. One can further consider the minimal set of vertex types (the resulting molecular components) needed for construction of the graph G, an invariant T(G). The invariants T(G) and B(G) are new graph invariants of intrinsic interest that have yet to be determined for most graphs. Except for a few special graph classes in some specific pot setting, these combinatorial questions are wide open. Familiar graph theoretical tools such as coloring, chromatic numbers, classical graph automorphism, etc., do not appear to determine T(G) or B(G) in any of the settings. The chromatic number provides a lower bound in some settings, although a poor one as it and T(G) can be arbitrarily far apart [28].

The problem can further be confounded by the constraints of several different experimental settings, including the degree of flexibility or rigidity of the arms and the strength of the cohesive sites, as well as yield considerations such as whether or not the incidental creation of complete complexes smaller than the target graph is acceptable. In [40] it was confirmed that it is NP-hard to determine the output of a pot and to determine whether a pot that assembles a target structure will also assemble unwanted smaller structures.

Mathematical formalism, models, and design tools for types of self-assembly can be found here [28, 37, 38, 40, 41], with [42] further specifying the inter-arm angles and cohesive end rotation orientations for rigid tiles. These resources provide provably optimal design strategies, i.e., combinatorial specification for the minimum number of cohesive regions and the minimum number of branched junction molecule types, to assembly a given wireframe target structure, for several common graph classes. Further computations of these invariants for specific graphs such as Platonic and Archimedes solids and prisms can be found in [43]. A further extensive body of work can be found on DNA Wang tiles, which are the restrictive case of completely rigid four-armed tiles. See, among others, Refs. [5, 44,45,46,47,48,49].

3 DNA Strand Routing and Topological Graph Theory

Determining routes for a scaffolding strand throughout assembly targets is integral to both DNA origami [27, 50, 51] and experimental verification of graph constructs [7]. A vertex of a graph structure can be traced by DNA in multiple ways such that the resulting DNA structure is assembled by different sets of cross-hybridizing cyclic molecules.

Fig. 3
An illustration represents a 2 sided arrow that indicates 2 D N A structures on either side of the arrow have 3 branches. On the left, the strands point in 3 directions, while on the right, the 3 strands are intersected and point in 3 directions.

Changing strand connections at a three-valent vertex in a graph. As DNA strands have polarity with a phosphate end (\(5'\)) and a hydroxyl group on the other end (\(3'\)), the arrowheads indicate the \(3'\) ends. The strands bind with opposite polarities. A three-armed junction molecule can exhibit two types of connections as depicted, e.g., the green strand has a direction left to right in the left figure, while it has direction left to top in the right figure

For a three-valent vertex, the DNA structure representing such vertex can have two local strand configurations (vertex connections) as shown in  Fig. 3. One configuration can be represented by ‘non-crossing’ strands at a vertex, and the other is obtained by ‘crossing strands’. By changing vertex connections, the number of cyclic molecules assembling the structure can vary. Figure 4a and b show two strand outlines of the same triangular prism graph. In Fig. 4a, the strand connections at each vertex in this planar representation do not ‘cross’ each other and the graph is outlined by five cyclic strands. By changing the strand connection at vertices \(v_1\) and \(v_4\) with the other configuration, the graph can be routed by a single circuit, hence outlined by a single cyclic molecule (as shown in Fig. 4b).

Fig. 4
3 illustrations represent 3 triangular prisms. They indicate the D N A strand routes. Illustration A has vertices with labels v 0, v 1, v 2, v 3, v 4, and v 5.

Possible routes for DNA strands in a triangular prism graph: a following the faces of a plane embedding produces 5 faces, corresponding to a maximum strand number of 5. b by changing strand connections at two vertices, the graph is traced by a single strand, reflecting that it is upper-embeddable on a double torus (see Fig. 5a). c A minimal length reporter strand for the graph, giving an edge-outer embedding on the torus (see Fig. 5b)

Early work focused on number of cyclic molecules in such double-strand covers, where the target graph is covered by a set of circuits so that every edge is covered exactly twice, as in Fig. 4a  [52]. These circuits form the facial walks of an oriented embedding of the graph, and thus correspond to a strong version of the cycle double cover conjecture, as described below. Double-strand covers have re-emerged recently in novel origami methods such as [53].

In DNA settings, strand routings of structures require turns at vertices that are constrained, i.e., routes may not doubleback on an edge. Because DNA strands are oriented, any repeated edges in a strand route must be traversed in the opposite direction when revisited. Thus, by folding a DNA origami in a graph structure, the strand routing objective is to find a route in the graph which meets these constraints, traverses every edge, ideally, with a minimum number of repeated edges. A similar problem arises in determining a route for the reporter strand, where after an experiment has been conducted, a single strand traversing the structure is extracted from the assembled construct and analyzed to yield the experimental data (see [7]). Here again, the objective is a route covering the graph with a minimum number of repeated edges. Such a minimal strand routing for the triangular prism graph is shown in Fig. 4c. In this case, the connections at three vertices, \(v_0\), \(v_3\), and \(v_4\) are changed relative to the connections in Fig. 4a.

The strand routing problem is in general intractable, even in the special case of an Eulerian surface mesh when an optimal route follows face boundaries. In this case the problem corresponds to finding A-trails in the surface, which are Eulerian circuits that turn either left or right at each vertex. This problem is known to be NP-hard (see [54]) even on the plane. In [51] it is proven that the strand routing problem remains NP-hard even for graphs of maximum degree 8. A survey of approaches to the strand routing problem is given in [27]. The problem can be translated to the traveling salesman problem (TSP), which makes the wealth of software available for solving the TSP available to DNA origami applications.

A notable alternative approach is given in [33]. Provided that the graph is an augmented triangulation of a surface topologically equivalent to a sphere, there is a fast algorithm for routing the scaffolding strand, albeit at the cost of duplicating a number of edges in the final nanostructure.

Mathematically, different routings correspond to different cellular embeddings of a graph in a surface (drawings of a graph in a surface such that its edges do not cross each other and such that the complement of the graph in the surface is a set of topological disks). In general, the number of strands outlining a graph is counter-proportional to the genusFootnote 2 of the surface in which the graph is embedded. As has repeatedly been the case with self-assembly applications, a wealth of new mathematical directions have emerged from the design objective, with edge-outer embeddability as described below being a particularly rich example expanding on topological graph theory.

An edge-outer embedding is a cellular embedding of a graph in an orientable surface where every edge lies on a special ‘outer’ face (there may be other faces too). Figure 5 shows an edge-outer embedding of the triangular prism graph in the double torus and the torus corresponding to the reporter strand routings in Fig. 4b, c. In [55] and [56] it was shown that the facial walk of an edge-outer face in an edge-outer embedding of the target graph exactly captures reporter strand routing constraints. While outer-planar and outer-projective-planar graphs, i.e., graphs in the plane or projective plane with all vertices on a single distinguished face, have been heavily studied [57], edge-outer embeddable graphs are an entirely new, yet very natural, construct.

Fig. 5
2 illustrations represent 2 embedded triangular prisms. A, the illustration is embedded in a double torus. B, the illustration is embedded in a single torus.

Embeddings of the triangular prism graph in Fig. 4. a Embedding of the graph in a double torus. The graph can be realized as a tertiary structure of a single DNA strand, shown here tracing the single face. b An embedding of a graph in a torus with a minimum length reporter strand tracing face boundary that covers all edges at least once

Conventional graph theoretical tools do not apply directly to edge-outer embeddability. For example, the Chinese postman problem lacks the bidirectional constraint while upper embeddability results require every edge to be covered exactly twice [58]. That every graph has such a reporter strand route, and hence an edge-outer embedding was shown in [55], with a short algorithmic proof given in [56]. At the heart of the proof is the reconfiguration shown in Fig. 3, which changes the cyclic order of the edges about a vertex. Finding a minimum such route, that is an embedding with a smallest possible edge-outer face, is in general NP-hard [56].

Such a computational complexity observation introduces a wealth of further open questions. If the graph is Eulerian, an optimal reporter strand route is simply an Euler circuit. Thus, in this case, Fleury’s algorithm provides a polynomial time solution to the problem. For what classes, other than Eulerian graphs, are there polynomial-time algorithms to find optimal reporter strand routes? Does there exist a polynomial-time algorithm guaranteed to return a route that is within \(x\%\) of the optimal length for some reasonable x?

It is also natural, both mathematically and for the application, to consider maximum length routes. If the graph is upper embeddable, i.e., has an orientable one-face embedding, then this embedding is a maximum edge-outer embedding with every edge covered twice in the facial walk (see Fig. 4b, c); however, not every graph is upper embeddable. As with the dichotomy in the complexity of maximum and minimum genus of the graph, it is possible that the maximum length problem is tractable.

4 DNA Origami and New Algebraic Structures

Since their introduction to the scientific community, DNA origami structures [59] have become one of the most prevalent experimental substrates for a variety of 3D structures [12]. The method uses a single-stranded DNA plasmid vector as a scaffold that outlines the shape of the desired structure with help of so-called staple strands. The staple strands are short segments of single-stranded DNA (about 24 to 32 nucleotides long) that span across two or three segments of the scaffold strand to bind it in place. In order to achieve stability of the construct, it is necessary that the staple strands cross each other within the structure in an antiparallel way. An example of a portion of an origami structure is depicted in Fig. 6. (Figure adjusted from [30] with added boxes). Systematic mathematical methods to describe DNA origami structures have not been established. An algebraic language that describes DNA origami motivated by the Temperley-Lieb algebra that has been extensively used in physics and knot theory, particularly with the Jones polynomial and the Kauffman bracket, was introduced in [60, 61]. Such languages could provide a method for modifying (through strand displacements) a given design to achieve either a more effective and stable structure or a completely new geometric shape.

Fig. 6
An illustration represents the origami structure of a D N A. It indicates scaffolds and shaded staples in a cylindrical bar like structure.

A segment of a DNA origami structure with black scaffold and colored staples. The generators of the proposed monoid are boxed

A well-studied Jones monoid \(\mathcal J_n\) has n generators are \(u_1,\ldots , u_{n}\). Each generator can be represented with \(n+1\) lines such that \(u_i\) has cap/cup connection of lines i and \(i+1\) as shown in Fig. 7a while all other lines are vertical. The product of generators is diagramatically represented by concatenating the corresponding diagrams vertically. Figure 7b shows the product \(u_iu_{i+1}u_i\) to the left of the equality where the vertical lines have not been depicted.

Fig. 7
4 illustrations. A represents a generator with labels i and i + 1. B, C, and D represent conjugacy, idempotent, and commutating relations, respectively. B has labels u subscript i and u subscript i + 1.

Generator \(u_i\) in (a) and three types of relations in (b, c, d) of the Jones monoid. b Conjugacy relation \(u_iu_{i+1}u_i=u_i\), c idempotent relation \(u_iu_i=u_i\), d commutating relation \(u_iu_j=u_ju_i\) for \(|i-j|\ge 2\)

The set of relations of \(\mathcal J_n\) is depicted in Fig. 7b–d. For example, (b) represents the relation \(u_i u_{i+1} u_i= u_i\). Although the generators and relations are symbolically written, they correspond directly to the depicted diagrams and planar isotopy. Taking this diagrammatic usage as an advantage, we use a monoid version that is a generalization of \(\mathcal J_n\) for describing DNA origami.

The basis of the origami algebraic language is a monoid \(\mathcal O_n\) inspired by \(\mathcal J_n\). It is defined by generators and relations whose diagrammatic representations and closures are parts of DNA origami with \(n+1\) scaffold strands tracing the structure (e.g., in Fig. 6 the structure is traced with 6 passings of the scaffold across top to bottom, and one can consider the structure as a representative of \(\mathcal O_5\)). The monoid has two types of generators, those that correspond to the scaffold strands connecting straight vertically in the structure (\(\alpha \)’s) and those that correspond to staple strands connecting across (\(\beta \)’s). The generators \(\alpha _i\) and \(\beta _i\) of the origami monoid \(\mathcal O_n\) of n strands are depicted in Fig. 8a, where the cap and cup are placed at the i-th position from the left, and the vertical lines surrounding these positions are not depicted.

We define a DNA origami monoid \(\mathcal O_n\) on \(n+1\) strands by generators and relations as follows. Generators of \(\mathcal O_n\) are \(\alpha _i\), for \(i=1, \ldots , n\). The \(\alpha _i\)’s represent local DNA foldings of a pair of staples of the form of cap and cup as indicated on the left in Fig. 8a. An additional set of generators \(\beta _i\) corresponds to pairs of caps and cups of the scaffold strand as indicated in the figure to the right. The index i indicates that the caps and cups are between the ith and \(i+1\)st strands of the structure. The set of relations is defined analogously to those of \(\mathcal J_n\), while the closure of the diagram is defined in a manner similar to the so-called plat closure in knot theory [62].

Fig. 8
3 illustrations. A represents 2 generators, alpha and beta. B and C represent different structures, with the 2 generators, alpha beta and alpha alpha, from which an arrow points to alpha on the right, respectively.

a Two types of generators, cross-over staples connecting straight-line portions of the scaffold (denoted with \(\alpha \)) and straight staples connecting two cross-over turning tips of the scaffold (denoted with \(\beta \)). b Multiplying generators is represented by structures connecting scaffold segments and respective staples, when they don’t ‘cross-over’ the scaffold strand. c An idempotent rule; the product \(\alpha \alpha \) has the top and bottom structure the same as \(\alpha \). The cyclic portion in the middle does not affect further products

To justify modeling DNA origami structures by words over the generators we make a correspondence between concatenations of generators \(\alpha _i\), \(\beta _i\) and connections of DNA segments. For a natural number \( n\ge 2\), the set of generators of the monoid \(\mathcal O_n\) is the set \(\Sigma _{n}=\{\alpha _{1},\alpha _{2},\dots ,\alpha _{n},\beta _{1},\beta _{2},\dots ,\beta _{n}\}\). For a product of two generators \(x_i\) and \(y_j\) in \(\Sigma _n\), we place the diagram of the first generator above the second, lining up the scaffold strings of the two generators, and then we connect the respective scaffold strands. If the two generators are ‘far’ apart, that is, if for indices i and j we have \(|i-j|\ge 2\), then no staple connection is performed. If the two generators are adjacent, that is, if for indices i and j it holds that \(|i-j|\le 1\), then we connect part of the staples. Since the staples are not too long within an origami structure, spanning only two or three segments of the scaffold, a convention of connecting staples representing a product of generators is motivated by the manner in which staples connect within the DNA origami structure. The staples of \(\alpha \)-type generators protrude “outside” of the scaffold in Fig. 8a. The staples are connected everywhere except when two non-extending staple-ends would have to cross a scaffold to connect. With this convention it means that \(\alpha _i \beta _i\) and \(\beta _i \alpha _i\) are two distinct words, i.e., that \(\alpha \)s and \(\beta \)s do not freely commute. The rules of connecting scaffold strands and staples assures that concatenation of three or more generators is associative. The graphical representation of a product \(\alpha _i\beta _i\) is shown in Fig. 8b where the vertical strands surrounding the generators are not depicted.

The generators of the origami monoid satisfy a set of relations, according to the types of graphical structures they represent. One of the relations is a generator idempotent relation as shown in Fig. 8c. The relations are extensions of the relations of the Jones monoids. In [61] two scenarios of relations and graphical representations of the elements of the corresponding monoids are given. For each scenario, the number of all possible structures is provided through the number of equivalence classes of words (or elements in the corresponding origami monoid). Also a polynomial time algorithm exists that computes the shortest word for each equivalence class. A connection between the Green’s relations of an origami monoid and those of a direct product of Jones monoids [62] is given in [60]. In particular, it was shown that an epimorphism \(p: \mathcal{O}_n \rightarrow \mathcal{J}_n \times \mathcal{J}_n\) induces the bijective correspondence on Green’s classes.

The definition of origami monoids is motivated by the Jones monoids, and the origami structure, concatenating two types of strands (scaffolds and staples), implies two types of generators with similar relations. This construction of doubling generators and imposing substitution relations can be generalized to other algebraic structures, or it can be generalized with more than two types of generators. These are algebraic problems directly arising from the experimental design of DNA origami.

5 DNA Origami and Origami Knots

DNA origami assembly can be confounded by knotting in the scaffolding strand. For example, in a preliminary experiment, an essentially planar target did not form well when a simply knotted (trefoil) scaffold was used [63]. Thus, tools are needed to avoid inadvertently knotted routes when designing the increasingly sophisticated targets of DNA origami. On the other hand, Seeman has shown it is possible to engineer single-stranded DNA with specified knotted topologies  [64], and this capacity for controlled knotting may be used intentionally to design better (higher yield, larger, smaller, more symmetric, more robust, more topologically complex, etc.) nanostructures by deliberately exploiting the topology of the knotted scaffold.

When routing a single scaffold through a graph-like target structure, a typical design constraint is avoiding self-crossings [24, 33]. If the target structure is modeled by an Eulerian graph cellularly embedded on an oriented surface in 3-space, then some of these routes correspond to A-trails, which are Eulerian circuits that turn either ‘left’ or ‘right’ at each vertex. If the surface is a sphere, then all A-trails in the target structure are necessarily unknotted, but for higher-genus surfaces there are settings in which every A-trail is knotted [26, 27]. The complete graph on seven vertices, \(K_7\), is shown in Fig. 9a, and an embedding of \(K_7\) on a torus is shown in Fig. 9b. Every vertex of \(K_7\) has even valency (i.e., valency 6) and hence, the graph is Eulerian. Three distinct Hamiltonian cycles are depicted, purple, green and yellow. The embeddings of the yellow and the green cycles are knotted.

Fig. 9
2 illustrations. A represents a heptagon with 7 vertices and 3 distinct Hamiltonian cycles inside in 3 shades. B represents an illustration with 7 vertices embedded in a single torus.

a A complete graph with seven vertices, \(K_7\), with three Hamiltonian cycles colored distinctly. b An embedding of \(K_7\) on a torus. The edges of the yellow and green cycles go around the torus handle. Both those cycles are knotted

Since standard DNA origami scaffold strands are unknotted, the problems arise of determining whether there exists an unknotted route for a scaffolding strand in a given geometrically (or surface) embedded target graphs, and of characterizing graphs which have unknotted routes. Once again, a new area of mathematics emerges, as determining knotted and unknotted routing trails is fundamentally different from previously studied knots and links in graphs. Prior work focused on intrinsically knotted and linked cycles in graphs (see, for example, [65,66,67]), appearing as cycles in the graph. However, for this application, the knots are Eulerian circuits rather than cycles. For example, Conway and Gordon [67] showed that every embedding of \(K_7\) in \(\mathbb {R}^3\) has at least one knotted cycle. Figure 9 shows an embedding of \(K_7\) on the torus. By Conway and Gordon [67], it contains at least one knotted Hamilton cycle (the green cycle, for example).

Fig. 10
2 illustrations. A represents an illustration that is embedded in a single torus with shaded regions. B represents the vertex and the shaded regions of the embedded joints.

a A checker-board coloring of the embedded \(K_7\), b A-trail transitions at a vertex of the embedding joining the shaded regions of the embedded \(K_7\). These A-trails bound a disk consisting of the shaded regions on the torus and therefore they outline \(K_7 \) with an unknotted routing

However, in contrast, [24, 26] have shown that every A-trail in the embedding of \(K_7\) in the torus is unknotted. This follows since the embedding of \(K_7\) in the torus is checkerboard colorable, and hence every A-trail bounds the union of the black regions, and hence a disk. It is consequently unknotted (Fig. 10). Moreover, in [24] it is shown that every Eulerian circuit is knotted on a torus if and only if there is a non-checkerboard colorable embedding of the graph.

For geometrically embedded graphs that are not necessarily embedded on some surface, and hence for which A-trails are not necessarily defined, the initial challenge is formalizing the constraints on how the scaffolding strand may pass through vertices.

A-trails can be generalized to O-trails, which permits analyzing the knotting of strand routes in more general geometric embeddings of graphs in space, or embedding on a surface where ‘non-crossing’ smoothings at the vertices are defined [24]. For example, Fig. 11 shows \(K_5\) embedded as a tetrahedron with a body-center vertex, together with an O-trail through it.

Fig. 11
An illustration represents a tetrahedron with an O trial. There are five vertices marked in different shades. There are four vertices in each corner base and one in the center top.

An O-trail in a geometric embedding of \(K_5\)

O-trails are Eulerian circuits in geometrically embedded Eulerian graphs so that at each vertex the edge pairings determined by the circuit are all non-crossing (i.e., there is a topological disk containing the half edges about the vertex in which both the turnings determined by the circuit and the orthogonal turnings are non-crossing). For A-trails in a surface mesh, the disk is just a small neighborhood of the vertex, so A-trails are a special subclass of O-trails.

The DNA origami application needs to understand and control the behavior of O-trails, both knotted and unknotted, in fixed geometric graphs, as well as over all possible embeddings of an abstract graph. Finding O-trails is NP-complete in general, since finding A-trails is, even for plane graphs, and the problem of determining if a knot is unknotted is NP-hard.

Both unknotted O-trails and constrained knot embeddings are entirely new directions in knot theory. There has been extensive prior work on knots in graphs, but this focused on intrinsically linked cycles or knotted Hamiltonian cycles [66,67,68]. Here however, the DNA origami constraints lead to questions of knots in Eulerian circuits, which opens an analogous and equally rich line of inquiry, but now in a completely new setting. Although embedding a given knot on a standard surface is known, considering geometric knot invariants under requirements imposed by DNA conformation is novel. Furthermore, characterizing graphs with unknotted O-trails will help identify target structures amenable to origami self-assembly from unknotted scaffolding strands and complements current experimentation [15]. In most applications a specific geometric embedding of the target is sought, so a pragmatic goal is to characterize classes of embedded graphs for which determining unknotted O-trails is computationally tractable.

6 Where Next?

The mathematical models discussed here generally follow what is becoming a common pattern. The first step is always develo** a theoretical formalism that simultaneously captures the essence of a design problem, while providing a foundation for the following theoretical work. The second step is develo** preliminary results for specific experiments.

When moving from specific experiments to seeking general strategies, e.g., via fast computer algorithms, particular design problems often can be prohibitively difficult. Thus the third step frequently is providing fast algorithms or proving that fast algorithms for general solutions might not be possible, i.e., are NP-hard.

Discovering that a DNA self-assembly problem is NP-hard is exciting theoretically, because it immediately opens a plethora of related computational problems. These include seeking approximation algorithms, finding optimal solutions for particular families of problems, and devising pragmatic approaches for urgently needed special cases.

Although an NP-hardness result does not help a lab trying to conduct its next experiment, it does prevent wasted effort seeking general strategies. Furthermore, a specific NP-hard problem may be reduced to another known NP-hard problem, for example, the Traveling Salesman Problem (TSP), but for which there already exist robust tools such as fast approximation algorithms and algorithms optimized for special cases. These then may be adapted to the self-assembly problem, as with the TSP reduction for strand routing in [27].

DNA self-assembly is now a rich source of theoretical problems. These problems and their solutions advance both the mathematics and the self-assembly processes. The lab constraints lead to new mathematics and often breakthrough new directions, as we point out here. The mathematical foundations supporting innovations in science and industry are often seen years after they are developed, and are sometimes unacknowledged, because by the time the theory has been applied, it has entered the mainstream of engineering. This emergent mathematical theory, this new ‘DNA mathematics’, lays the foundations that can support the further growth of DNA self-assembly technologies, and whose potential applications, although inspired by DNA self-assembly, may still be yet to come.