Keywords

1 Introduction

The fast distribution of resource-constrained devices demands efficient encryption and authentication of short messages. Forkciphers are a recent proposal by Andreeva et al. [1] to address this purpose. Like classical (tweakable) block ciphers, they encrypt a plaintext block under a secret key; In contrast, however, forkciphers compute two ciphertext blocks from the same input. To boost the performance, the state in the middle of the computation is forked, and both ciphertext blocks are computed separately only from the middle. Therefore, the construction can share some computations and has to encrypt only twice over the bottom rounds. Thus, efficient AE schemes can obtain a ciphertext and tag efficiently for messages whose size is at most a block. Owing to this construction, forkciphers provide a new interface called reconstruction that takes one of the ciphertext blocks as input and returns the other one.

As instance of particular interest, Andreeva et al. [1] proposed ForkAES, which employs the original key schedule and round function of the AES-128. Moreover, ForkAES is a tweakable block cipher that adopts the concept from KIASU-BC [15]: in every round where the round key is XORed to the state, an additional 64-bit public tweak T is XORed to the topmost two state rows. ForkAES encrypts the plaintext P over the first five rounds exactly as in the KIASU-BC; though, it forks the middle state X and produces from it a ciphertext \(C_0\) exactly as KIASU-BC with the round keys \(K^5\) through \(K^{10}\) plus a second ciphertext \(C_1\) under six further round keys \(K^{11}\) through \(K^{16}\).

Existing Security Arguments. The adoption of the AES round function and the tweak process from KIASU-BC allowed to profit from existing results, e.g., for the resistance against differential and linear cryptanalysis. Andreeva et al. also considered meet-in-the-middle attacks briefly; concerning further attacks, they stated that: “the security of our forkcipher design can be reduced to the security of the AES and KIASU ciphers for further type of attacks” [1, Sect 3.2]. However, the structure of ForkAES may allow new attack angles, and it appeared to be a highly interesting task for the community to study ForkAES deeply.

Contribution. This work analyzes attack vectors on forkciphers and ForkAES in depth. We generalize it to \({\textsf {ForkAES}}\text {-}{r_t}\text {-}{r_{b_0}}\text {-}{r_{b_1}} \), where \(r_t\), \(r_{b_0}\) and \(r_{b_1}\) denote the number of rounds from P to X, from X to \(C_0\), and from X to \(C_1\), respectively; e.g., ForkAES-\({5}\text {-}{5}\text {-}{5}\) means the original ForkAES. While we consider only the case \(r_{b_0} = r_{b_1}\), we indicate by \({\textsf {ForkAES}}\text {-}{*}\text {-}{r_{b_0}}\text {-}{r_{b_1}} \) if \(r_t\) can be any non-negative integer.

First, we observe that the security of the reconstruction of forkciphers is very different from the encryption and decryption of the conventional AES since the first half of the computation uses the inverse of the round function whereas the second half employs the ordinary round function. We exploit this property by introducing reflection differential trails that allow to attack nine rounds (ForkAES-\({5}\text {-}{4}\text {-}{4}\)) with a low complexity. We also present impossible-differential [4, 7,8,9, 12, 17] and yoyo [3, 20] attacks as well as forgery attacks for the AE mode by exploiting the reflection feature.

Second, we consider the restricted case where the reconstruction interface is unavailable. This is natural for some usages. For example, Andreeva et al. [1] suggested to replace the standard CTR mode with forkciphers; two ciphertext blocks of forkciphers can halve the number of primitive calls to generate the same key-stream length. In such settings, reconstruction (and even decryption) queries of forkciphers are not exploitable by adversaries. We show that even in such environments, attacks can reach nine rounds by a rectangle [5, 6, 10, 18, 22] and an impossible-differential attack. Those attacks also exploit the forking step, which produces rectangle quartets from pairs of plaintexts.

Our attacks do not endanger the security of the full ForkAES; however, they contradict some of the designer’s claims as they cover one round more than attacks for KIASU-BC [13, 21]. More importantly, the forking principle exposes reflection properties in reconstruction queries (Table 1).

Table 1. Comparison of Attacks. CP and CR denote chosen plaintexts and chosen reconstruction queries, respectively. Due to the limited space, two attacks are omitted and are detailed in the full version of this work [2].

Outline. Next, we briefly revisit the necessary details on the AES, KIASU-BC, and ForkAES. Sections 35 detail our attacks based on reflection queries and Sects. 6 and 7 describe our attacks based on encryption queries. Due to space limitations, those sections contain only a representative description of an attack each; detailed results can be found in the full version of this work [2].

2 Preliminaries

General Notation. We assume, the reader is familiar with the concepts of block ciphers and their analysis. Most of the time, we consider bit strings of fixed length. We mostly use uppercase letters (e.g., X) for bit strings, lowercase letters for indices (x), and calligraphic letters for sets (\(\mathcal {X} \)). For some positive integer n, we interpret bit strings \(X \in \{0,1\}^{n} \) as vector elements of \(\mathbb {F} _2^n\), where addition is the bit-wise XOR, denoted by \(\oplus \). Moreover, the AES works on byte vectors or byte matrices, i.e., 16-element vectors in \(\mathbb {F} _{2^8}\). So, we interpret byte matrices of r rows and c columns as elements of \(\mathbb {F} _{2^8}^{r \times c}\).

Forkciphers. Let \(\mathcal {B} \), \(\mathcal {K} \), and \(\mathcal {T} \) be non-empty sets or spaces. A tweakable forkcipher \(\widetilde{E} \) is a tuple of three deterministic algorithms: An encryption algorithm \(\widetilde{E}: \mathcal {K} \times \mathcal {T} \times \mathcal {B} \rightarrow (\mathcal {B})^2\); a decryption algorithm \(\widetilde{D}: \mathcal {K} \times \mathcal {T} \times \mathcal {B} \times \{0,1\}^{} \rightarrow \mathcal {B} \); and a tag-reconstruction algorithm \(\widetilde{R}: \mathcal {K} \times \mathcal {T} \times \mathcal {B} \times \{0,1\}^{} \rightarrow \mathcal {B} \). The encryption produces \(\widetilde{E} _K^T(P) = (C_0 \,\Vert \, C_1)\). We define \(\widetilde{E} _K^T(P)[0] = C_0\) and \(\widetilde{E} _K^T(P)[1] = C_1\) Decryption and tag reconstruction take a bit b s. t. it holds \(\widetilde{D} _K^{T, b}(\widetilde{E} _K^T(P)[b]) = P\), for all \(K, T, P, b \in \mathcal {K} \times \mathcal {T} \times \mathcal {B} \times \{0,1\}^{} \). The tag-reconstruction takes K, T, \(C_b\), and b as input, and produces \(C_{b \oplus 1}\). The ideal tweakable forked permutation \(\widetilde{\varPi } \) encrypts messages P under two independent permutations \(\widetilde{\pi } _0, \widetilde{\pi } _1: \mathcal {T} \times \mathcal {B} \rightarrow \mathcal {B} \), and outputs \((C_0 \,\Vert \, C_1)\) as \(C_b \leftarrow \widetilde{\pi } _b(P)\), for \(b \in \{0,1\}^{} \).

The AES-128 is a substitution-permutation network over 128-bit inputs, which transforms the input through ten rounds consisting of SubBytes (SB), ShiftRows (SR), MixColumns (MC), and a round-key addition with a round key \(K^i\). At the start, a whitening key \(K^0\) is XORed to the state; the final round omits the MixColumns operation. We write \(S^i\) for the state after Round i, and \(S^i[j]\) for the j-th byte, for \(0 \le i \le 10\) and \(0 \le j \le 15\). Further, we use \(S^{r,\textsf {SB}}\), \(S^{r,\textsf {SR}}\), and \(S^{r,\textsf {MC}}\) for the states in the r-th round directly after the SubBytes, ShiftRows, and MixColumns operations, respectively. The byte ordering is given by:

$$\begin{aligned} \begin{bmatrix} 0&4&8&12 \\ 1&5&9&13 \\ 2&6&10&14 \\ 3&7&11&15 \end{bmatrix}. \end{aligned}$$

We adopt a similar convention for the round keys \(K^i\) and their bytes \(K^i[j]\), for \(0 \le i \le 16\); for both, we also use often a matrix-wise indexing of the bytes from 0, 0 to 3, 3. More details can be found in [11, 19].

KIASU-BC [15] is a tweakable block cipher that differs from the AES-128 only in the fact that it XORs a public 64-bit tweak T to the topmost two rows of the state whenever a round key is XORed. We denote the tweak by T and by T[j], \(0 \le j \le 7\), the bytes of T. The bytes are ordered as

$$\begin{aligned} \begin{bmatrix} 0&2&4&6 \\ 1&3&5&7 \\ \end{bmatrix}. \end{aligned}$$

ForkAES is a forkcipher based on KIASU-BC. It forks the state after five rounds and transforms it twice to two ciphertexts \(C_0\) and \(C_1\). We denote the states of the first branch by \(X^i =^{\text {def}} S^i\), for \(5 \le i \le 10\), where \(X^5 = S^5\) and \(X^{10} = C_0\). Moreover, we denote the states of the second branch by \(Y^i\), for \(5 \le i \le 10\), where \(Y^5 = S^5\) and \(Y^{10} = C_1\). We will also write \(\mathsf {R}\) for the sequence \(\textsf {MC} \circ \textsf {SR} \circ \textsf {SB} \). and \(\mathsf {KS}\) for an iteration of the AES-128 key schedule. A schematic illustration is given in Fig. 1, and more details can be found in [1]. We will sometimes reorder the linear operations, e.g., swap MixColumns, ShiftRows, and the key addition. We will write \(\widetilde{K}^r = \textsf {MC} ^{-1}(K^r)\) and \(\widehat{K}^r = \textsf {SR} ^{-1}(\textsf {MC} ^{-1}(K^r)\) for the transformed round keys.

Fig. 1.
figure 1

ForkAES. R is the AES-128 round function; KS a round of its key schedule. (Color figure online)

Subspaces of the AES. We adopt the notion of AES subspaces from Grassi et al. [14]. Given a vector space \(\mathcal {W} \) and a subspace \(\mathcal {V} \subseteq \mathcal {W} \); if a is an element of \(\mathcal {W} \), then, a coset \(\mathcal {V} \oplus a =^{\text {def}} \{v \oplus a | \forall v \in \mathcal {V} \}\) is a subset of \(\mathcal {V} \) in \(\mathcal {W} \). We consider vectors and vector spaces over \(\mathbb {F} _{2^8}^{4 \times 4}\), and denote by \(\{e_{0,0}, \ldots , e_{3,3}\}\) the unit vectors of \(\mathbb {F} _{2^8}^{4 \times 4}\), i.e., \(e_{i,j}\) has a single 1 in the i-th row and j-th column. For a vector space \(\mathcal {V} \) and a function \(F: \mathbb {F} _{2^8}^{4 \times 4} \rightarrow \mathbb {F} _{2^8}^{4 \times 4}\), we let \(F(\mathcal {V}) =^{\text {def}} \{F(v) | v \in \mathcal {V} \}\). For a subset \(\mathcal {I} \subseteq \{1, 2, \ldots , n\}\) and a subset of vector spaces \(\{\mathcal {V} _1, \mathcal {V} _2, \ldots , \mathcal {V} _n\}\), we define \(\mathcal {V} _{\mathcal {I}} =^{\text {def}} \bigoplus _{i \in \mathcal {I}} \mathcal {V} _i\). We adopt the definitions by Grassi et al. of four families of subspaces for the AES, for \(i \in \{0, 1, 2, 3\}\):

  • the column spaces \(\mathcal {C} _i\) as \(\mathcal {C} _i = \langle e_{0,i}, e_{1,i}, e_{2,i}, e_{3,i}\rangle \),

  • the diagonal spaces \(\mathcal {D} _i\) as \(\mathcal {D} _i = \textsf {SR} ^{-1}(\mathcal {C} _i)\),

  • the inverse-diagonal spaces \(\mathcal {ID} _i\) as \(\mathcal {ID} _i = \textsf {SR} (\mathcal {C} _i)\), and

  • the mixed spaces \(\mathcal {M} _i\) as \(\mathcal {M} _i = \textsf {MC} (\mathcal {ID} _i)\).

The S-box \(\mathsf {S}: \mathbb {F} _{2^8} \rightarrow \mathbb {F} _{2^8}\) of the AES has a few well-analyzed properties; here, we briefly recall one that will be relevant in our later attacks.

Property 1

Let \(\alpha , \beta \in \mathbb {F} _{2^8} \setminus \{0^8\}\). For \(F \in \{\mathsf {S}, \mathsf {S}^{-1}\}\), it holds that \(|\{ x : \mathsf {F}(x) \oplus \mathsf {F}(x \oplus \alpha ) = \beta \}|\) equals four in one, two in 126, and zero in 129 cases. So, for any differential \(\alpha \rightarrow \beta \), there exists approximately one input x on average that satisfies the differential.

3 Attack on ForkAES-\({*}\text {-}{4}\text {-}{4}\) with Reflection Trails

Our attacks can work for arbitrary value of \(r_t\). Then the round-key indices for two forking parts depend on the value of \(r_t\). To avoid making the analysis unnecessarily complex, we explain our attacks by using the case with \(r_t=5\).

Observations for Reconstruction Queries. Recall that the first half and the last half of the reconstruction is the inverse and the ordinary round function, respectively. This motivates us to consider the reflection property introduced by Kara [16] against the block cipher GOST. The final 16 rounds of GOST consist of an eight-round Feistel network with the round keys in order \(K^0\), \(K^1\), ..., \(K^7\), followed by eight rounds with \(K^7\), \(K^6\), ..., \(K^0\) in this order. Since Feistel networks are involutions, this enables the following so-called reflection property.

Proposition 1

(Reflection Property). When an input value V achieves a symmetric state after eight rounds, i.e. left branch value is identical with right branch value, the output of the final eight rounds will be V.

The reflection property is strong, but possesses limitations: there must not exist round constants, the round keys must be ordered inverted in the first and second chunks, and the target function must be an involution.

This paper considers a differential version of the reflection property. To be more general, the same concept applies if we build trails that are invariant w.r.t. XOR. Suppose, a round function F consists of an arbitrary bijective function, an XOR with a round constant \(c_i\), and an XOR with a round key \(K^i\). Consider 2r rounds, where the first r rounds apply F and the final r rounds apply \(F^{-1}\). The round keys \(K^i, i=1,2,\ldots ,2r\) as well as the round constants \(c_i\), \(i = 1\), 2, ..., 2r can differ individually. Then, we have the following property.

Proposition 2

(Reflection Differential Trails). If there exists a differential for the r-round transformation \(F^r\) that propagates a difference \(\Delta I\) to \(\Delta O\) with probability p, there exists a differential for the 2r-round transformation \((F^{-1})^r~\circ ~F^r\) that propagates a difference \(\Delta I\) to \(\Delta I\) with probability at least \(p^2\). This property holds for any choice of round keys and constants in the 2r rounds.

Reflection trails can be applied to reconstruction queries of forkciphers where \(C_1\) (resp. \(C_0\)) is computed from \(C_0\) (resp. \(C_1\)). The first and last halves of a reconstruction query are back- and forward computations of the same round function, and different round keys and round constants do not impact the property.

Reflection trails are particularly useful for the AES, which achieves full diffusion in only two rounds. There, a single active byte propagates to \(1 {\mathop {\longrightarrow }\limits ^{F^{-1}}}4 {\mathop {\longrightarrow }\limits ^{F^{-1}}}16\) active bytes. In contrast, it propagates as \(1 {\mathop {\longrightarrow }\limits ^{F^{-1}}}4 {\mathop {\longrightarrow }\limits ^{\mathrm {reflect}}}4 {\mathop {\longrightarrow }\limits ^{F}}1\) in the reflection trail, where \({\mathop {\longrightarrow }\limits ^{F}}\) and \({\mathop {\longrightarrow }\limits ^{F^{-1}}}\) denote the propagation of the number of active S-boxes with F and \(F^{-1}\), respectively, and \({\mathop {\longrightarrow }\limits ^{\mathrm {reflect}}}\) denotes the duplication of the state by forkciphers. This idea allows us to build long differential trails.

It is notable that the designers of ForkAES did not expect the existence of reflection trails. In fact, based on the property that the maximum probability of differential characteristics for four-round AES is \(2^{-150}\), the designers claim as “Since our ForkAES design uses the AES round function, we can easily deduce that our design will provide enough security in this setting after four rounds against differential attacks in the single-key model.” [1, Sect. 3.2]

The combination of the reflection trail and a KIASU-like tweak injection yields further efficient differential trails. Tweak difference allows an attacker to create a blank round, and the reflection trail increases the number of blank rounds to 2. Indeed, the reflection trail with \(4 {\mathop {\longrightarrow }\limits ^{F^{-1}}}1 {\mathop {\longrightarrow }\limits ^{F^{-1}}}0 {\mathop {\longrightarrow }\limits ^{F^{-1}}}4 {\mathop {\longrightarrow }\limits ^{\mathrm {reflect}}}4 {\mathop {\longrightarrow }\limits ^{F}}0 {\mathop {\longrightarrow }\limits ^{F}}1 {\mathop {\longrightarrow }\limits ^{F}}4\) bytes enables the attacker to build a very efficient trail.

The Differential Trail and Probability. The linear computations in the last round do not affect the security. Hence, we introduce the equivalent ciphertext \(\widehat{C}_0 := \textsf {SR} ^{-1} \circ \textsf {MC} ^{-1} (C_0 \oplus T)\) and equivalent key \(\widehat{K}^{9} := \textsf {SR} ^{-1} \circ \textsf {MC} ^{-1} (K^{9})\). \(\widehat{C}_1\) and \(\widehat{K}^{14}\) can be defined similarly. Refer to Fig. 2 for the differential trail, where we append one round to the above-mentioned trail in reconstruction queries. The attacker queries \(C_1\) and obtains \(C_0\).

Fig. 2.
figure 2

Truncated differentials for ForkAES-\({*}\text {-}{4}\text {-}{4}\). (Color figure online)

The number of active bytes injected by \(\widehat{C}_1\) must shrink to one during the inverse of MixColumns and must be canceled by the tweak difference, which occurs with probability \(2^{-32}\). In Round 6, the four-byte difference in a diagonal position must shrink to one-byte difference and be canceled by the tweak difference. This also occurs with probability \(2^{-32}\). So, the total probability of this trail is \(2^{-64}\).

Attack Procedure. During the attack, the tweak difference is fixed.

  1. 1.

    Choose tweaks T and \(T'\) with the fixed difference. For each pair \(T, T'\), choose \(2^{32}\) distinct values for the first column of \(\widehat{C}_1\). Fix the other 12 bytes to arbitrary values and compute the corresponding \(C_1\) offline and query them to obtain the corresponding \(C_0\). Compute the corresponding \(\widehat{C}_0\) offline. Hence, we obtain \(2^{32}\) choices of \(\widehat{C}_0\) with T and \(2^{32}\) choices of \(\widehat{C}_0\) with \(T'\).

  2. 2.

    From \(2^{64}\) pairs of \(\widehat{C}_0\) between different tweaks, pick the one with 12 inactive bytes in the Columns 2, 3, and 4 of \(\widehat{C}_0\). We expect one right pair.

  3. 3.

    For the right pair, obtain \(2^{7}\) key candidates of the first column of \(\widehat{K}^9\), which has 1 active byte in the top byte after the inverse of MixColumns and moreover the difference should be one of the \(2^7\) choices that can be output from the tweak difference after the S-box. This step is colored by red in Fig. 2.

  4. 4.

    Iterate the steps above by shifting the active-byte positions to obtain \(2^{7}\) candidates for each column of \(\widehat{K}^9\). \(2^{28}\) candidates are then tested exhaustively.

Complexity Evaluation. The data complexity is \(4 \cdot (2^{33} + 2^{33}) = 2^{35}\) reconstruction queries. The memory complexity is \(2^{33}\) AES states to store \(2^{33}\) values of \(\widehat{C}_0\). The time complexity is \(2^{19}\) memory access to queried data and \(2^{28}\) encryptions for the last exhaustive search. Note that in Step (3), there are \(2^{7}\) choices of the input difference to the last SubBytes and the output difference from this SubBytes are fixed to the ciphertext difference of the right pair. For the AES S-box, a randomly chosen pair of input and output differences can be propagated with probability about \(2^{-1}\), and once they can be propagated, the number of solutions is about 2. Therefore, \(2^7 \times (2^{-1})^4\) pairs can be propagated for all the 4 bytes, and the number of total solutions is \(2^7 \times (2^{-1})^4 \cdot 2^4 = 2^{7}\). So, \(2^{7}\) candidates of one column of \(\widehat{K}^9\) can be obtained with \(2^{7}\) computations.

Experimental Verification. We implemented the attack on ForkAES-\({*}\text {-}{3}\text {-}{3}\) which removed the last rounds of the above attack. ForkAES-\({*}\text {-}{3}\text {-}{3}\) can be attacked with \((Data, Time, Memory)=(2^{19},2^{28},2^{17})\). This implementation in Java demonstrates its validity.

4 Impossible-Differential Attack with Reflection Trails

This section describes an impossible-differential distinguisher on ForkAES-\({*}\text {-}{4}\text {-}{4}\) with reconstruction queries; we will extend it for key recovery.

Distinguisher. The impossible differential distinguisher is as follows.

$$\begin{aligned} 1 {\mathop {\longrightarrow }\limits ^{F^{-1}}}0 {\mathop {\longrightarrow }\limits ^{F^{-1}}}4 {\mathop {\longrightarrow }\limits ^{\mathrm {reflect}}}4 {\mathop {\longrightarrow }\limits ^{F}}\ ? {\mathop {\longrightarrow }\limits ^{F}}\ ? {\mathop {\longleftrightarrow }\limits ^{\text {impossible}}} \ ? {\mathop {\longrightarrow }\limits ^{F}}\ ? {\mathop {\longrightarrow }\limits ^{F}}12. \end{aligned}$$
(1)

The positions of active bytes are illustrated in Fig. 3. The fact that those trails are satisfied with probability zero is explained as follows.

  • Trail from \(Y^7\mathbf{: }\) After the tweak injection along with \(K^6\), any number of bytes can be active in the leftmost column. They are moved to different columns by the following ShiftRows operation. After MixColumns, each column is either fully active or fully inactive.

  • Trail from \(\widehat{C}_0\mathbf{: }\) After the inverse of MixColumns and ShiftRows, at least one inverse diagonal is inactive. Moreover, at least three bytes are active in the state. The subsequent tweak injection (along with \(K^7\)), never affects the inactive inverse diagonal. It may cancel one active byte in the state, but does not impact the analysis. In summary, we have the following two properties.

    1. 1.

      There is at least one inactive byte for each column.

    2. 2.

      The number of active bytes is at least two.

The case that the trail from \(Y^7\) has no active byte is impossible, because the trail from \(\widehat{C}_0\) ensures at least three active bytes. The case that the trail from \(Y^7\) has at least 1 fully active column is impossible because the trail from \(\widehat{C}_0\) ensures at least one inactive byte for each column. Hence, any trail from \(Y^7\) is impossible to propagate to the difference of \(\widehat{C}_0\).

The inactive column position at \(\widehat{C}_0\) is the rightmost (4th) column in Fig. 3, but it can also be located in the second or third column position. It cannot be located in the leftmost (first) column because of the tweak difference.

Key Recovery. We append key recovery rounds for the trail in Fig. 3 as depicted in Fig. 4. Suppose, we have a pair of outputs with only a single active column at \(\widehat{C}_1\). Then, only five (equivalent-)key bytes must be guessed.

Fig. 3.
figure 3

Impossible-differential distinguisher. (Color figure online)

Attack Procedure. During the attack, the tweak difference is fixed.

  1. 1.

    Choose two tweaks \(T,T'\) having the fixed difference. For each of \(T,T'\), choose \(2^{32}\) distinct values for the active 4-byte values of \(\widehat{C}_1\) and fix the other 12 bytes to arbitrary value, say 0. After making \(2^{33}\) reconstruction queries, we obtain \(2^{32}\) choices of \(\widehat{C}_0\) associated with T and with \(T'\).

  2. 2.

    From \(2^{64}\) pairs of \(\widehat{C}_0\) with different tweaks, pick one with at least one inactive column in Columns 2, 3, or 4 at \(\widehat{C}_0\). We expect \(3 \cdot 2^{64-32}=2^{33.58}\) pairs.

  3. 3.

    For each picked pair, derive \(2^7\) wrong candidates of the top-left byte of \(\widehat{K}^{13}\) and the leftmost column of \(\widehat{K}^{14}\) by trying \(2^7\) possible differences in the middle rounds. After evaluating \(2^{33.58}\) pairs, we obtain \(2^{40.58}\) wrong-key candidates.

  4. 4.

    Iterate the steps above \(2^{4.42}\) times by changing the fixed 12 bytes of \(\widehat{C}_1\). We obtain \(2^{40.58+4.42} = 2^{45}\) wrong candidates of the 5 key bytes. After obtaining \(2^N\) wrong keys, the remaining key space for those five bytes is estimated as \(2^{40} \cdot ( 1 - 2^{-40} )^{2^N}\). \(N=45\) is sufficient to reduce the remaining key space to 1 since \(2^{40} \cdot ( 1 - 2^{-40} )^{2^{45}} = 2^{40} \cdot ( 1 - 2^{-40} )^{2^{40}\cdot 2^{5}} = 2^{40} \cdot e^{-2^5} = 2^{-6.17} < 1\).

  5. 5.

    Iterate the above steps three times by shifting the active byte positions to recover all bytes of \(\widehat{K}^{14}\).

Fig. 4.
figure 4

The appended rounds for key recovery. (Color figure online)

Complexity Evaluation. To recover one column of \(\widehat{K}^{14}\), we make \(2^{4.42} \cdot (2^{32}+ 2^{32}) = 2^{37.42}\) reconstruction queries. The data complexity to recover all bytes of \(\widehat{K}^{14}\) is \(4 \cdot 2^{37.42} = 2^{39.42}\).

To recover one column of \(\widehat{K}^{14}\), we spend \(2^{45}\) encryptions to discard \(2^{45}\) wrong-key candidates. The time complexity to recovery all bytes of \(\widehat{K}^{14}\) is \(4 \cdot 2^{45} = 2^{47}\).

For the memory complexity, we use the 40-bit counter to record wrong-key candidates, which is equivalent to \(2^{33}\) AES states. To recover 1 column of \(\widehat{K}^{14}\), we also need to store \(2^{33}\) \(\widehat{C}_0\) and \(2^{34}\) pairs satisfying the differences. Hence, the memory complexity is \(2^{33}+2^{33}+2^{34}=2^{35}\) AES states.

5 Yoyo Key-Recovery Attack on ForkAES-\({*}\text {-}{3}\text {-}{3}\)

The yoyo game was introduced by Biham et al. against Skipjack [3]. Rønjom et al. [20] reported deterministic distinguishers for two generic Substitution-Permutation (SP) rounds. We review existing work in Appendix A. Here, we observe that, during reconstruction queries, two-round decryption and two-round encryption can be computed independently for each column, which we call a MegaSBox.

MegaSBox in ForkAES. Refer to Fig. 5 for the MegaSBox construction of ForkAES. Consider any inverse diagonal in \(Y^{6,\textsf {SR}}\). After \(\textsf {SR} ^{-1}\) and \(\textsf {SB} ^{-1}\), the MegaSBox aligns to a column. After \(\textsf {MC} ^{-1}\), the column remains independent of the other columns. The inverses of SR and SB align the bytes back into a diagonal. After the reflection, the same operations are applied to these four bytes; after SR, those bytes align to an inverse diagonal in \(X^{6,\textsf {SR}}\). Clearly, the value in this inverse diagonal depends only on the same inverse diagonal in \(Y^{6,\textsf {SR}}\). This can be considered as a MegaSBox with 32-bit input (inverse diagonals). The transition from \(Y^{6,\textsf {SR}}\) to \(X^{6,\textsf {SR}}\) can be depicted in terms of 4 parallel MegaSBoxes. To be explicit, for \(x \in \{0,1\}^{32}\), the computation of MegaSbox is defines as \(\textsf {MegaSBox}(x) := \textsf {SR} \circ \textsf {SB} \circ \textsf {ATK} \circ \textsf {MC} \circ \textsf {SR} \circ \textsf {SB} \circ \textsf {ATK} \circ \textsf {ATK}^{-1} \circ \textsf {SB} ^{-1} \circ \textsf {SR} ^{-1} \circ \textsf {MC} ^{-1} \circ \textsf {ATK}^{-1} \circ \textsf {SB} ^{-1} \circ \textsf {SR} ^{-1}(x)\), where \(\textsf {ATK}\) denotes the addition of a round key and a tweak.

Fig. 5.
figure 5

MegaSBox of ForkAES

Fig. 6.
figure 6

Yoyo Key Recovery for ForkAES-\({*}\text {-}{3}\text {-}{3}\). (Color figure online)

Key-recovery Attack. For applying the Yoyo game on ForkAES-\({*}\text {-}{3}\text {-}{3}\), \(S_1 \cdot L \cdot S_2\) needs to be identified. Referring to Fig. 5, following \(\textsf {MC} \) and SB of \(X^{6,\textsf {SR}}\) can be regarded as L and \(S_2\) layers respectively. Four MegaSBoxes act as \(S_1\) layer. Thus the operations from \(Y^{6,\textsf {SR}}\) to \(X^{7,\textsf {SB}}\) constitute the \(S_1 \cdot L \cdot S_2\) construction. We choose a pair of texts (\(x_1,x_2\)) in \(Y^{6,\textsf {SR}}\) and compute \(X^{7,\textsf {SB}}\); bytes are swapped among the texts in \(X^{7,\textsf {SB}}\) and their corresponding values in \(Y^{6,\textsf {SR}}\) are calculated as (\(x'_1,x'_2\)). Theorem 1 in Appendix A ensures that \(\nu (x_1 \oplus x_2)=\nu (x'_1 \oplus x'_2)\).Footnote 1 Refer to Fig. 6 for the attack. It starts with activating one column at \(\widehat{C}_{1}\) for a pair of texts and queries the reconstruction algorithm for a pair of \(\widehat{C}_{0}\). We use the propagation \(4 {\mathop {\longrightarrow }\limits ^{\textsf {MC} ^{-1}}} 1\) in \(Y^{6,\textsf {SR}}\), which activates a single MegaSBox with probability \(2^{-22}\). Due to the MegaSBox, only one SuperSbox (inverse diagonal) is active in \(X^{6,\textsf {SR}}\). Out of 4 bytes of the inverse diagonal, at the cost of \(2^{-6}\), we get one inactive byte. Thus, \(\widehat{C}_{0}\) has one inactive column with probability \(2^{-28}\).

Attack Procedure.

  1. 1.

    Choose a tweak; choose \(2^{14.5}\) distinct random values for the first column of \(\widehat{C}_1\). Fix the other 12 bytes to arbitrary value. Obtain the corresponding \(\widehat{C}_0\) via reconstruction queries. After this step, we have about \(2^{28}\) pairs of \(\widehat{C}_0\).

  2. 2.

    For each of the \(2^{28}\) pairs of \(\widehat{C}_0\), check if one column is inactive or not for the pair; we expect one right pair. Once a right pair is obtained, swap the bytes at \(\widehat{C}_0\) for applying the yoyo trick and reconstruction algorithm is queried to get a pair which is fully active in \(\widehat{C}_1\). We retrieve two such pairs (right pairs).

  3. 3.

    For both right pairs, obtain \(\widehat{K}^{12}\) that have only 1 active byte in the first column, e.g., by exhaustively guessing a single-byte difference before MixColumns and propagate them through MixColumns. Each right pair suggests \(2^{10}\) key candidates. By analyzing 2 right pairs, the key will be uniquely fixed.

  4. 4.

    Step 3 is iterated for the remaining columns.

Complexity Evaluation and Experimental Verification. The attack needs \(2^{14.5}\) reconstruction queries; its time complexity is \(2^{14.5}\) memory accesses, and the memory complexity is \(2^{14.5}\) AES states for \(2^{14.5}\) values \(\widehat{C}_0\).

We verified the attack on ForkAES-\({*}\text {-}{3}\text {-}{3}\) by implementing it in Java. The attack started with initializing an oracle that randomly chooses a key, before the steps in the attack procedure above were followed. In the key-recovery phase, two right pairs were used to retrieve candidates for each column of \(\widehat{K}^{12}\). Using the first right pair yielded 976, 1296, 1008, and 976 candidates for Column 0, 1, 2, and 3, respectively. The second right pair reduced the candidates to 1, 1, 2, and 1, respectively. Hence, we obtained two key candidates.

Fig. 7.
figure 7

Overview (left) and bottom trail (right) of our rectangle attack. The key recovery covers the parts below the dashed horizontal line and guesses the bytes with G. (Color figure online)

6 Rectangle Attack with Encryption Queries

This section describes a rectangle attack on ForkAES-\({*}\text {-}{4}\text {-}{4}\); for concreteness, we exemplify it for five top rounds. Briefly spoken, boomerangs and rectangles are types of differential cryptanalysis where a given cipher E is split into sub-ciphers \(E = E_2 \circ E_m \circ E_1\) such that there exist a differential \(\alpha \rightarrow \beta \) with probability p over \(E_1\), a middle trail \(\beta \rightarrow \gamma \) with probability r, and a differential \(\gamma \rightarrow \delta \) with probability q over \(E_2\). Note that, we approximate the middle part \(E_m\) to be empty for our attack. The differentials are often referred to as upper and lower differentials or trails. The probability of a correct quartet is often approximated by \(r(pq)^2\) since the trails must hold for both pairs.

We consider two tuples (PT) and \((P', T')\) that are encrypted to \((C_0, C_1)\) and \((C'_0, C'_1)\), respectively. We denote by \(\Delta X^r = X^r \oplus {X'}^r\) their differences between the states after Round r that lead to \(C_0\), and by \(\Delta Y^r = Y^r \oplus {Y'}^r\) the differences in the states that lead to \(C_1\). For clarity, we define that the fork from X to \(C_0\) employs the round keys \(K^5\) through \(K^9\), and the fork from Y to \(C_1\) uses \(K^{10}\) through \(K^{14}\). An overview is depicted on the left side of Fig. 7. There, \(\mathsf {R}^{T}_{K^{i..j}}\) means the round sequence \(\mathsf {R}^T_{K^j} \circ \cdots \circ \mathsf {R}^T_{K^i}\). We construct \(2^8\) sets of \(2^s\) plaintext-tweak tuples. The sets differ in T[0]; all plaintexts in a set share the same tweak. So, we can combine \(2^s\) texts (tuples of \(C_0\), \(C_1\)) of Set i with \(2^s\) texts of Set j, for \(i \ne j\), or \(2^s \cdot \left( {\begin{array}{c}2^8\\ 2\end{array}}\right) \simeq 2^{2s+15}\) pairs (quartets of \(C_0\), \(C_1\), \(C_0^{\prime }\), \(C_1^{\prime }\)).

The Top Differential. In contrast to the pure AES or to KIASU-BC, the forking step guarantees that the difference between the inputs to Rounds 6 and 10 is equal for each plaintext. So, the top differential reduces to the key addition, that is, the XOR with \(K^5\) for the branch that encrypts from X to \(C_0\), and to the XOR with \(K^{10}\) for the branch that encrypts from Y to \(C_1\). So, \(\alpha = \beta = K^5 \oplus K^{10}\) holds with probability one for each pair. The adversary collects pairs and waits that the difference at the beginning of the bottom trail occurs, whose probability can be approximated by \(2^{-128}\). From approximately \(2^{2s+15}\) pairs, we expect \(2^{2s-113}\) to have a specific difference \(\gamma \) at the forking step.

For the Middle Phase and the Bottom Differential, we use two simplifying assumptions: (1) all differences after five rounds are equally possible; (2) all four-byte values of the keys \(K^5[0,5,10,15]\) and \(K^{10}[0,5,10,15]\) are equally possible. The bottom trail is shown on the right side of Fig. 7. There are four active S-boxes at the start of Round 6. We consider only text pairs with a non-zero tweak difference \(\Delta T[0]\). To estimate the probability, we iterate over all possible values of \(X^{6,\textsf {SB}}[0,5,10,15] = (\bar{x}_0, \bar{x}_1, \bar{x}_2, \bar{x}_3)\), all differences \(K^5[0\), 5, 10, \(15] \oplus K^{10}[0\), 5, 10, \(15] = (\beta _0\), \(\beta _1\), \(\beta _2\), \(\beta _3)\) and all non-zero 255 tweak differences \(\Delta T[0] \ne 0\); \(\Delta T[0]\) maps uniquely through \(\textsf {MC} ^{-1}\) to the differences in \(X^{6, \textsf {SB}}[0\), 5, 10, \(15] \oplus {X'}^{6, \textsf {SB}}[0\), 5, 10, 15]; the same difference must hold between the terms \(Y^{6, \textsf {SB}}[0,5,10,15] \oplus {Y'}^{6, \textsf {SB}}[0\), 5, 10, 15]. We define \(\textsf {MC} ^{-1}((\Delta T[0], 0, 0, 0)) = (\zeta _0\), \(\zeta _1\), \(\zeta _2\), \(\zeta _3)\). Note that \(\zeta _0\) defines \(\zeta _1\), \(\zeta _2\), and \(\zeta _3\) uniquely. Moreover, \((\bar{x}_0\), \(\bar{x}_1\), \(\bar{x}_2\), \(\bar{x}_3\), \(\zeta _0\), \(\beta _0\), \(\beta _1\), \(\beta _2\), \(\beta _3)\) are mutually independent. This is the setting as in the Boomerang-connectivity Table [10] whose entries contain the number of values \(x_i\) for a pair \((\zeta _i, \beta _i)\) that satisfy the boomerang switch for a byte. So, the BCT values already sum over all values \(x_i\). Over all choices of the values \(\bar{x}_i\), all non-zero differences \(\zeta _i\), and non-zero differences \(\beta _i\), we obtain a probability of

$$\begin{aligned}& \frac{1}{255 \cdot (256)^8} \sum _{\zeta _0 \ne 0} \sum _{\beta _0} \left( \Pr [\zeta _0] \!\cdot \! \Pr [\beta _0] \!\cdot \! \textsf {BCT} (\beta _0, \zeta _0) \right) \cdot \sum _{\beta _1} \left( \Pr [\beta _1] \!\cdot \! \textsf {BCT} (\beta _1, \zeta _1) \right) \cdot \\& \sum _{\beta _2} \left( \Pr [\beta _2] \!\cdot \! \textsf {BCT} (\beta _2, \zeta _2) \right) \cdot \sum _{\beta _3} \left( \Pr [\beta _3] \!\cdot \! \textsf {BCT} (\beta _3, \zeta _3) \right) = \frac{(520)^4}{255 \cdot 256^8} \simeq 2^{-35.905}. \end{aligned}$$

Here, we use the fact that each row and column of the BCT sums to 520 for the AES S-box. So, the probability for the switch can be approximated by \(2^{-36} \cdot 2^{-128}\) for hitting our difference between two queries. The remainder in the bottom trail holds with probability 1. Thus, we can expect about \(2^{2s-149}\) correct pairs.

Offline Preparations. We define a linear map \(F: \mathbb {F} _{2^8}^{4 \times 4} \rightarrow \mathbb {F} _{2^8}^{12}\) that returns the value of the 12 inactive bytes in \(\Delta X^{9,\textsf {SR}}\). So, we can identify pairs \((C_i, C'_i)\) with our desired difference from collisions between \(F(\textsf {MC} ^{-1}(T \oplus C_{b})) = F(\textsf {MC} ^{-1}(T' \oplus C'_{b}))\) with two evaluations of F per text instead of comparing all differences.

We can perform another offline step for saving effort later. Let \(x = X^{9,\textsf {SB}}[0\), 7, 10, 13], \(x' = {X'}^{9,\textsf {SB}}[0,7,10,13]\), \(k^{8} = \widetilde{K}^8[0]\), and \(k^{9} = \widetilde{K}^{9}[0,7,10,13]\) be short forms. We construct a hash map \(\mathcal {H}: \mathbb {F} _{2^8} \times \mathbb {F} _{2^8} \times \mathbb {F} _{2^8}^{4} \times \mathbb {F} _{2^8}^{4} \rightarrow \left( \mathbb {F} _{2^8}^{5}\right) ^*\) such that for all inputs \((T[0], T'[0], x, x')\), \(\mathcal {H} \) returns exactly those keys \((k^{8}, k^{9})\) that map x and \(x'\) to a zero difference at \(\Delta X^{7,\textsf {MC}}\). The trail contains 32 bit conditions that have to be fulfilled; thus, \(\mathcal {H} \) maps to approximately \(2^8\) suggestions of 40 key bits on average. \(\mathcal {H} \) can be used also to obtain suggestions for \(\widetilde{K}^{13}[0]\) and \(\widetilde{K}^{14}[0,7,10,13]\) from inputs \(Y^{9,\textsf {SB}}[0,7,10,13]\), \({Y'}^{9,\textsf {SB}}[0,7,10,13]\), T[0], and \(T'[0]\).

Attack Steps. The steps are as follows:

  1. 1.

    Initialize an empty list \(\mathcal {Q} \). Initialize two zeroed lists of byte counters for 40 key bits each: \(\mathcal {K} \) for \((\widetilde{K}^8[0], \widetilde{K}^{9}[0,7,10,13])\), and \(\mathcal {L} \) for \((\widetilde{K}^{13}[0], \widetilde{K}^{14}[0,7,10,13])\).

  2. 2.

    Precompute \(\mathcal {H} \).

  3. 3.

    Choose an arbitrary base tweak \(T \in \mathbb {F} _{2^8}^{2 \times 4}\). Construct \(2^8\) sets \(\mathcal {S} ^i\). For each set, choose \(2^s\) plaintexts P such that all texts in a set use the same tweak value T. Ask for their \(2^{s+8}\) encryptions \((T, C_0, C_1)\), invert the final tweak addition, and the final MixColumns operation for each output tuple \((C_0, C_1)\).

  4. 4.

    We define \(Q_b = F(\textsf {MC} ^{-1}(T \oplus C_b))\), for \(b \in \{0,1\}^{} \). For all ciphertexts, compute \(Q_0\) and \(Q_1\) from \(C_0\) and \(C_1\) and store \((T, C_0, C_1, Q_0, Q_1)\) into buckets of \(\mathcal {Q} \).

  5. 5.

    Focus on pairs of tuples \((T, C_0, C_1, Q_0, Q_1)\) and \((T', C'_0, C'_1, Q'_0, Q'_1)\) if \(T[0] \ne T'[0]\), \(C_0 = C'_0\) and \(C_1 = C'_1\). We call such pairs of tuples with our desired property correct pairs. Discard all tuples that do not form correct pairs.

  6. 6.

    For each correct pair, lookup in \(\mathcal {H} \) the suggestions of the 40 key bits \(\widetilde{K}^{8}[0]\) and \(\widetilde{K}^{9}[0\), 7, 10, 13] from T[0], \(T'[0]\), \(X^{9,\textsf {SB}}[0\), 7, 10, 13], and \({X'}^{9,\textsf {SB}}[0\), 7, 10, 13]. We expect \(2^8\) suggestions on average. For each suggested key candidate, increment its corresponding counter in \(\mathcal {K} \).

  7. 7.

    Similarly, for each correct pair, lookup in \(\mathcal {H} \) the suggestions for the 40 key bits \(\widetilde{K}^{13}[0]\) and \(\widetilde{K}^{14}[0\), 7, 10, 13]. We expect \(2^8\) suggestions on average. For each suggestion, increment the corresponding counter in \(\mathcal {L} \).

  8. 8.

    Output the keys in \(\mathcal {K} \) and \(\mathcal {L} \) in descending order of their counters.

  9. 9.

    While the adversary has 80 key bits, the key schedule may render it more performant to start from the 40 bits of either \(\widetilde{K}^{8}[0]\), \(\widetilde{K}^{9}[0\), 7, 10, 13] or \(\widetilde{K}^{13}[0]\), \(\widetilde{K}^{14}[0\), 7, 10, 13] and search the 88 remaining key bits with the given data.

Complexity. From \(2^8\) sets of \(2^s\) texts each, we expect \(2^{2s-149}\) correct pairs; \(s = 77\) yields \(2^{5}\) correct pairs on average, and needs \(2^{85}\) plaintext-tweak tuples. The time complexity consists of the following terms:

  • \(\mathcal {H} \) can be precomputed in Step (2) by decrypting one column over 2 rounds \(2^{80}\) times, which yields at most \(2/13 \cdot 1/4 \cdot 2^{80} \simeq 2^{75.3}\) encryption equivalents.

  • Step (3) needs \(2^{s+8}\) encryptions of 13 AES rounds each.

  • Step (4) employs \(2 \cdot 2^{s+8}\) evaluations of F and \(2 \cdot 2^{s+8} \cdot (s+8)\) memory accesses (MAs). This step yields \(2^{2s+15} \cdot 2^{-192} = 2^{2s-177}\) wrong pairs plus \(2^{2s-149}\) correct pairs on average.

  • Step (6) does not need \(\mathcal {H} \), but can test the keys on-the-fly, for \(2 \cdot 2^5\) states of \(2^{40}\) keys, of 1 / 4 of the state through two out of 13 rounds. Each surviving pair requires \(2 \cdot 2^8\) MAs to \(\mathcal {H} \) plus \(2 \cdot 2^8\) MAs to \(\mathcal {K} \) and \(\mathcal {L} \) on average. We expect an average sum of all counters of \(2^8\,\cdot \,2^{2s-149} = 2^{13}\) in each of both lists, distributed normally over the keys. For \(s = 77\), we expect \((2^{-23}\,\cdot \,2^8) + 2^5\,\cdot \,2^8 \simeq 2^{13}\) counters over the 40 key bits on average.

We can expect that the correct keys have a significantly higher number of counts. So, we obtain about \(2^{75.3} + 2 \cdot 2^5 \cdot 2^{40}\,\cdot \,\frac{1}{4}\,\cdot \,\frac{2}{13} + 2^{s+8} + 2 \cdot 2^{s+8} + 2^{88} \simeq 2^{88.5}\) Encryptions and \(2 \cdot 2^{s+8} \cdot (s+8) + 2 \cdot 2^{2s-177} \cdot 2 \cdot 2^8 + 2 \cdot 2^5 \cdot 2^8 \simeq 2^{92.4}\) MAs. The attack needs \(2^{80}\) byte counters for the keys; \(\mathcal {Q} \) needs \(2^{s+8} \cdot (2 \cdot 16 + 8) < 2^{s+13.33} \simeq 2^{90.4}\) bytes of memory, or \(2^{86.4}\) states, which dominates the memory complexity.

Fig. 8.
figure 8

Left: The trail \(\Delta C_0 \rightarrow \Delta X\). Right: One variant of an impossible trail \(\Delta C_1 \not \leftarrow \Delta Y\). White bytes are inactive, light-blue bytes possibly active, and dark-blue bytes are active. Parts below the dashed horizontal lines are considered in the on-line phase. (Color figure online)

7 Impossible-Differential Attack with Encryption Queries

Impossible Differentials. This section outlines an impossible-differential attack on ForkAES-\({*}\text {-}{4}\text {-}{4}\). Again, we describe it for five top rounds. The high-level idea is straight-forward: The adversary queries plaintexts under tweaks that differ only in T[0] and waits for tuples \((C_{i,0}, T_i)\) and \((C_{j,0}, T_j)\). It inverts the final \(\textsf {MC} ^{-1}\) operation and tweak addition, and uses the ciphertexts only if their difference \(\Delta \widetilde{C}_0\) (before MC) activates only the inverse diagonal \(\mathcal {ID} _0\), as given in the left side of Fig. 8. It deduces those key bytes \(\widetilde{K}^{9}[0,7,10,13]\) and \(\widetilde{K}^8[0]\) that lead to a zero difference in \(\Delta X^{7, \textsf {MC}}\), i.e., that cancel after the tweak XOR at the end of Round 7. Then, there is a zero difference through the inverse Round 7, which leads to a single active byte in \(\Delta X^{6,\textsf {MC}}\), and to a single active diagonal at the start of Round 6. Again, see the left side of Fig. 8. The second trail decrypts \(\Delta C_1\) backwards to \(\Delta Y = \Delta X\). So, at least one of the following cases must hold:

  1. (1)

    \(\Delta Y^{7}\) has at least one fully active column: \(\Delta Y^7 \in \mathcal {C} _i\).

  2. (2)

    Bytes \(\Delta Y^{7}[1,2,3]\) are active.

  3. (3)

    \(\Delta C_1 \in \mathcal {M} _0\), i.e., is in the mixed space, generated by \(\Delta Y^{9,\textsf {SR}} \in \mathcal {ID} _0\).

In Case (3), the \(\Delta Y\) trail is similar to the \(\Delta X\) trail. So, we have a distinguisher similar to the rectangle distinguisher described in Sect. 6. However, this section tries to exploit a different distinguisher with lower data complexity and does not have to wait for such an event. In the Cases (1) and (2), the Columns 1 to 3 of \(\Delta Y^7\) are either completely active or completely inactive. Thus, the adversary can guess eight bytes of \(\widetilde{K}^{14}\) that are mapped to one of those columns and can filter out all key guesses where one of those columns would become partially active.

Offline Preparations. We define \(\widetilde{X}^{r,\textsf {SR}} =^{\text {def}} \textsf {SR} (\textsf {SB} (X^{r-1})) \oplus \widetilde{K}^{r}\), and \(\widetilde{Y}^{r,\textsf {SR}}\), \(\widetilde{X'}^{r,\textsf {SR}}\), and \(\widetilde{Y'}^{r,\textsf {SR}}\) analogously. Again, we can define a linear map F of rank 96 such that \(F(\textsf {MC} ^{-1}(\Delta C_0 \oplus \Delta T)) = 0\) so that we can identify pairs with our desired difference from collisions in \(\Delta \widetilde{X}^{9,\textsf {SR}}\). We construct a hash map \(\mathcal {H} _0: \mathbb {F} _{2^8} \times \mathbb {F} _{2^8} \times \mathbb {F} _{2^8}^{4} \times \mathbb {F} _{2^8}^{4} \rightarrow (\mathbb {F} _{2^8}^{5})^*\) that maps \(x = (T[0], T'[0], \widetilde{X}^{9,\textsf {SR}}[0, 7, 10, 13], \widetilde{X'}^{9,\textsf {SR}}[0, 7, 10, 13])\) to all five-byte keys that yield \(\Delta X^{7,\textsf {MC}} = 0\). We construct a second hash map \(\mathcal {H} _1: \mathbb {F} _{2^8}^{8} \times \mathbb {F} _{2^8}^{8} \rightarrow (\mathbb {F} _{2^8}^8)^*\). For all inputs \(x = (\widetilde{Y}^{9,\textsf {SR}}[2, 3, 5, 6, 8, 9, 12, 15]\), \(\widetilde{Y'}^{9,\textsf {SR}}[2, 3, 5, 6, 8, 9, 12, 15])\), \(\mathcal {H} _1(x)\) returns exactly the keys \(\widetilde{K}^{14}[2, 3, 5, 6, 8, 9, 12, 15]\) that yield one of the impossible differentials in \(\Delta Y^{8, \textsf {SR}}\).

\(\mathcal {H} _1\) does not need the tweak as input since the final tweak addition, MixColumns, and ShiftRows can be inverted before the lookup in \(\mathcal {H} _1\); the tweak addition at the end of Round 8 does not affect the difference in \(\Delta Y^{8,\textsf {SR}}\). Note that \(\mathcal {H} _1\) can be built more efficiently from several smaller lookup tables since the columns can be computed independently from each other.

There exist four combinations of bytes \(\Delta Y^{8,\textsf {SR}}[i,j]\) with \((i,j) \in \{(8,15)\), (9, 12), (10, 13), \((11,14)\}\) and two options if Byte i or Byte j is active. Among \(2^{32}\) difference inputs to \(\textsf {MC} ^{-1}\), \(2^{24}\) are mapped to an output difference with a zero-difference byte at a fixed index. On the other hand, \(2^{32} - 2^{24}\) inputs yield a non-zero difference at a given byte index. Thus, given an input \(Y^{9,\textsf {SB}}\), \(\mathcal {H} _1\) returns \(4 \cdot 2\) combinations of \(2^{24} \cdot (2^{32} - 2^{4}) \simeq 2^{56}\) keys that yield the impossible differential. This can be evaluated with \(4 \cdot 2\) calls to two 32-bit tables each, or 16 tables that map 32 state bits to \(2^{32}\) or \(2^{24}\) keys. So, \(\mathcal {H} _1\) needs \(8 \cdot 2^{32} \cdot 2^{32} \cdot 4 \text { bytes } + 8 \cdot 2^{32} \cdot 2^{24} \cdot 4 \text { bytes } \simeq 2^{72}\) bytes of memory. The tables can be computed with at most \(16 \cdot 2^{32} \cdot 2^{32}\) quarter-rounds of the AES, which is at most \(16/13 \cdot 2^{64} \simeq 2^{64.3}\) equivalents of ForkAES-\({5}\text {-}{4}\text {-}{4}\).

Attack Procedure. The steps in the attack are as follows:

  1. 1.

    Initialize two empty lists \(\mathcal {Q} \) and \(\mathcal {K} \); the latter will hold all 13-byte keys \(\widetilde{K}^8[0]\), \(\widetilde{K}^9[0,7,10,13]\), and \(\widetilde{K}^{14}[2,3,5,6,8,9,12,15]\).

  2. 2.

    Choose an arbitrary base tweak \(T \in \mathbb {F} _{2^8}^{2 \times 4}\). Construct \(2^8\) sets \(\mathcal {S} ^i\) from iterating over T[0]. For each set, choose \(2^s\) plaintexts P. All texts in a set use the same tweak \(T^i\) with \(T^i[0] = i\). Ask for their \(2^{s+8}\) encryptions \((T, C_0, C_1)\).

  3. 3.

    For each ciphertext, invert the final tweak addition, the final MC operation, and process all ciphertexts by F: \(Q_b = F(\textsf {MC} ^{-1}(C_b \oplus T))\), for \(b \in \{0,1\}^{} \). Store \((T, C_0, C_1, Q_0, Q_1)\) into buckets of \(\mathcal {Q} \).

  4. 4.

    Only consider pairs of tuples \((T, C_0, C_1, Q_0, Q_1)\) and \((T', C'_0, C'_1, Q'_0, Q'_1)\) if \(T \ne T'\) and \(Q_0 = Q'_0\). Discard all other tuples. We call pairs of tuples with our desired property correct pairs.

  5. 5.

    For each correct pair, derive from \(\mathcal {H} _0\) the key candidates \(\widetilde{K}^8[0]\) and \(\widetilde{K}^9[0\), 7, 10, 13] that yield a zero difference in \(\Delta X^{7,\textsf {MC}}\). Further derive from \(\mathcal {H} _1\) all key candidates for \(\widetilde{K}^{14}[2,3,5,6,8,9,12,15]\) that yield one of the impossible differentials. Remove those candidates from \(\mathcal {K} \).

  6. 6.

    Output the 13-byte key candidates remaining in \(\mathcal {K} \).

Conditions and Complexities. The adversary queries \(2^8\) sets of \(2^s\) texts each and guesses 13 key bytes in total: \(\widetilde{K}^{9}[0,7,10,13]\), \(\widetilde{K}^8[0]\), and \(\widetilde{K}^{14}[2, 3, 5, 6, 8, 9, 12, 15]\), i.e., 104 key bits. The attack requires pairs with \(\Delta C_0 \in \mathcal {M} _0\), which occurs with probability of approximately \(p \simeq 2^{-96}\). We can assume that \((\Delta C_0, \Delta C_1) \in \mathcal {M} _0 \times \mathcal {M} _0\) never occurs by accident; while it could theoretically still occur and could be exploited, we consider a different distinguisher here.

The probability that a key \(\widetilde{K}^9[0,7,10,13]\) reduces the four active bytes in \(\Delta X^{9,\textsf {SR}}\) to a single active byte in \(\Delta X^{8,\textsf {MC}}[0]\) is \(2^{-24}\), and its difference is \(\Delta T[0]\) in \(\Delta X^7\) with probability \(2^{-8}\). So, a key in the \(\Delta X\) trail yields our desired differential with a probability of about \(2^{-32}\). There are four options which columns in \(\Delta Y^7\) become partially active, and two options for the order which of the two known bytes in this column are active/inactive. The probability for one inactive byte is \((2^{-8} - 1) \cdot (1 - 2^{-8}) \simeq 2^{-8}\); so, a key yields the impossible differential in \(\Delta Y^{7, \textsf {MC}}\) with probability approximately \(2^{-32} \cdot 2^{-8} \cdot 4 \cdot 2 \simeq 2^{-37}\).

In the framework by Boura et al. [9], this can be represented as 37 bit conditions that have to be fulfilled to filter a key from a given correct pair. The probability for a wrong key to survive is \(p_{\mathsf {survive}} = \left( 1 - 2^{-37}\right) ^{N}\), where N is the number of correct pairs. For \(2^{104}\) keys, \(p_{\mathsf {survive}} \le 2^{-104}\) would allow us to filter all keys to only the correct key, plus at most a few more false positives. For this purpose, we need \(N \ge 2^{43.2}\) pairs with 12 inactive bytes in \(\Delta \widetilde{X}^{9, \textsf {SR}}\), which yields \(2^{43.2} \cdot 2^{12 \cdot 8} = 2^{139.2}\) necessary pairs. From \(2^s\) structures, we can construct about \(2^{2s+15}\) pairs, which gives \(s = 62.1\) or \(C_N = 2^{s+8} = 2^{70.1}\) queries. The computational complexity is composed of the following terms:

  • Precompute \(\mathcal {H} _0\) with \(2^{80}\) times twice a quarter round of the AES, which can be approximated by \(2^{80} \cdot 2/13 \cdot 1/4 \simeq 2^{75.3}\) encryption equivalents.

  • Precompute \(\mathcal {H} _1\) with at most \(2^{64.3}\) encryption equivalents.

  • Encrypt \(2^{s+8}\) plaintext-tweak tuples.

  • Invert \(2^{s+8} \cdot 2\) times the final tweak addition, MixColumns, and ShiftRows operation, which can be overestimated by \(2^{70.1} \cdot 2 \cdot 1/13 \approx 2^{67.5}\) encryptions.

  • Apply F to all states \(C_0\), which is at most \(2^{s+8}\) ForkAES computations, or \(2^{70.1} \cdot 2 \simeq 2^{70.1}\) encryptions. Moreover, we need \(2 \cdot 2^{s+8} \cdot (s+8) = 2 \cdot 70.1 \cdot 2^{70.1} \simeq 2^{77.3}\) MAs on average with an efficient data structure. We obtain about \(2^{2s+15-96} \simeq 2^{2s-81} = 2^{43.2}\) remaining pairs.

  • For each of the \(2^{43.2}\) pairs allows to filter keys. Since we have 37 bit conditions, each pair allows to filter \(2^{104 - 37} = 2^{67}\) keys on average from \(\mathcal {H} _0\) and \(\mathcal {H} _1\) with two MAs each and remove them from \(\mathcal {K} \).

  • Our attack aims at recovering 104 bits of \(\widetilde{K}^{9}\) and \(\widetilde{K}^{14}\). So, the final term for recovering 64 remaining key bits of \(\widetilde{K}^{14}\) can be estimated by \(2^{64}\) encryptions.

The time complexity can be bounded by about \(2^{75.3} + 2^{64.3} + 2^{70.1} + 2^{67.5} + 2^{70.1} + 2^{64} \simeq 2^{75.4}\) encryptions and \(2 \cdot 2^{70.1} + 2^{77.3} + 2^{43.2} \cdot 2 + 2^{43.2} \cdot 2^{67} \simeq 2^{110.2}\) MAs. The attack needs \(2^{80} \cdot 2^8 \cdot 40\) bits for \(\mathcal {H} _0\), at most \(2^{72}\) bytes for the components of \(\mathcal {H} _1\), \(2^{s+8} = 2^{70.2} \cdot (2 \cdot 16 + 8) < 2^{s+14} = 2^{76.2}\) bytes for \(\mathcal {Q} \), and \(2^{104}\) byte counters or (\(2^{100}\) states) for \(\mathcal {K}\); the latter term dominates the memory complexity.