Keywords

1 Introduction

Recently, several symmetric schemes have been proposed to reduce the number of field multiplications in their natural algorithmic description, often referred to as the multiplicative complexity. These ciphers fall into two main categories. The first one contains ciphers that minimize the use of multiplications in \( \mathbb {F}_{2} \), for instance, Flip [54], Keyvrium [22], LowMC [4], and Rasta [33]. The second category is comprised of ciphers having a natural description in larger fields, which are mostly binary fields \( \mathbb {F}_{2^n} \) and prime fields \( \mathbb {F}_{p} \). Examples include MiMC [3], GMiMC [2], Jarvis [8], Hades [41], Poseidon [40] and Vision and Rescue [6]. The design of low multiplicative complexity ciphers is motivated by applications such as secure Multi-Party Computation (MPC), Fully Homomorphic Encryption (FHE), and Zero-Knowledge proofs (ZK). These recent ciphers based on specialized designs highly outperform “traditionally” designed ones in these applications. The search of minimizing the multiplicative complexity while providing a sufficient security level is an opportunity to explore and evaluate innovative design strategies.

The sheer number of potentially devastating attacks on recently published designs implies that the design of schemes with low multiplicative complexity has not reached a mature state yet. Indeed, we count numerous attacks on variants of LowMC [32, 59], Flip [35], MiMC [36], GMiMC [15, 19], Jarvis [1], and Starkad/Poseidon [15]. Attacks that are performed on schemes defined for larger fields mostly exploit weaknesses of the algebraic cipher description, e.g., Gröbner bases attacks on Jarvis [1] or higher-order differential attacks on MiMC [36]. Nonetheless, attack vectors such as differential cryptanalysis [17] and linear cryptanalysis [52] do not appear to threaten the security of these designs. Indeed, the latter two techniques seem to be able to attack only a tiny fraction of the rounds compared to algebraic attacks.

Interestingly, the mentioned ciphers working over larger fields are inspired by design strategies proposed in the 1990s to mitigate differential cryptanalysis. For example, MiMC resembles the Knudsen-Nyberg cipher [56], Jarvis claims to be inspired by the design of Rijndael [27, 28], while Hades, Vision, and Rescue take inspiration from Shark [60]. The latter ciphers have a linear layer that consists of the application of a single MDS matrix to the state. An important commonality between all those examples is a non-linear layer that operates on individual field elements, e.g., cubing single field elements or computing their inverse. Furthermore, design strategies naturally working over larger fields easily prevent differential cryptanalysis. However, algebraic attacks seem to be their main threat. Therefore, it is worth exploring different design strategies to increase the resistance against algebraic attacks.

Our Design: Ciminion. In that spirit, Ciminion offers a different design approach in which we do not apply non-linear transformations to individual field elements. Instead, we use the ability of the multiplication to provide non-linear diffusion between field elements. Our cipher is built upon the Toffoli gate [62], which is a simple non-linear bijection of field elements that transforms the triple (abc) into the triple \((a,b,ab+c)\). The binary version of the Toffoli gate is used as a building block in modern ciphers, such as FRIET [61], which inspired our design. In addition to this, the S-box of Xoodoo [26] can also be described as the consecutive application of three binary Toffoli gates. With respect to the linear layer, we learned from ciphers like LowMC [4] that very heavy linear layers can have a considerably negative impact on the performance of applications [31]. Therefore, we decide to pair the Toffoli gate with a relatively lightweight linear layer to construct a cryptographic permutation on triples of field elements. Compared to the designs that use a non-linear bijection of a single field element, e.g., cubing in \( \mathbb {F}_{2^n} \) for odd n, we can define our permutation on any field, and then provide a thorough security analysis for prime fields and binary fields.

Fig. 1.
figure 1

Comparison of a Farfalle construction and a Hades-like scheme.

We do not use a bare primitive in the applications, but we employ primitives in a mode of operation. Indeed, instead of constructing a primitive of low multiplicative complexity, our goal is to provide a cryptographic function of low multiplicative complexity. We achieve this by using a modified version of the Farfalle construction to make it possible to perform stream encryption. Farfalle [12] is an efficiently parallelizable permutation-based construction with a variable input and output length pseudorandom function (PRF). It is built upon a primitive, and modes are employed on top of it. The primitive is a PRF that takes as input a key with a string (or a sequence of strings), and produces an arbitrary-length output. The Farfalle construction involves two basic ingredients: a set of permutations of a b-bit state, and the so-called rolling function that is used to derive distinct b-bit mask values from a b-bit secret key, or to evolve the secret state. The Farfalle construction consists of a compression layer that is followed by an expansion layer. The compression layer produces a single b-bit accumulator value from a tuple of b-bit blocks representing the input data. The expansion layer first (non-linearly) transforms the accumulator value into a b-bit rolling state. Then, it (non-linearly) transforms a tuple of variants of this rolling state which are produced by iterating the rolling function, into a tuple of (truncated) b-bit output blocks. Both the compression and expansion layers involve b-bit mask values derived from the master key.

We slightly modify Farfalle (see Fig. 3) and instantiate it with two different permutations: \(p_C\) for the compression part, and \(p_E\) for the expansion part. Those two permutations are obtained by iterating the same round function, but with a different number of rounds. In our construction, the permutation \(p_C\) takes an input that is the concatenation of a nonce \(\aleph \) and a secret key, and it derives a secret intermediate state from this input. Then, the intermediate state is updated by using a simple rolling function, and fixed intermediate keys. From this intermediate state, the keystream for encrypting the plaintext is derived by using the permutation \(p_E\). In order to prevent backward computation, the outputs of the expansion layers are truncated. Our security analysis that is presented in Sect. 4 shows that \(p_E\) requires a significantly lower number of rounds than \(p_C\). The relatively low number of multiplications that is used per encrypted plaintext element leads to a remarkably overall low multiplicative complexity. The full specification for Ciminion is presented in Sect. 2. A detailed rationale of the choices made during the design process is given in Sect. 3. A reference implementation can be found at https://github.com/ongetekend/ciminion.

Fig. 2.
figure 2

Number of MPC multiplications of several designs over \({ (\mathbb {F}_{p})^{t} }\), with \(p\approx 2^{128}\) and \(t\ge 2\) (security level of 128 bits).

A Concrete Use Case: Multi-party Computation. The primary motivation of our design is to explore the limits on the use of non-linear operations in cipher design, while limiting the use of linear operations, and ensuring a secure design. The main body of our paper is thus dedicated to cryptanalysis which is accompanied by one specific use-case, namely Secure Multi-Party Computation.

MPC is a subfield of cryptography that aims to create methods for parties to jointly compute a function over their inputs, without exposing these inputs. In recent years, MPC protocols have converged to a linearly homomorphic secret sharing scheme, whereby each participant is given a share of each secret value. Then, each participant locally adds shares of different secrets to generate the shares of the sum of the secrets. In order to get data securely in and out of a secret-sharing-based MPC system, an efficient solution is to directly evaluate a symmetric primitive within such system. In this setting, “traditional” PRFs based on, e.g., AES or SHA-3 are not efficient. Indeed, they were designed with different computing environments in mind. Hence, they work over data types that do not easily match the possible operations in the MPC application. As developed in [43], “traditional” PRFs like AES and SHA-3 are rather bit/byte/word-oriented schemes, which complicate their representation using arithmetic in \({ \mathbb {F}_{p} }\) or/and \({ \mathbb {F}_{2^n} }\) for large integer n, or prime p.

From a theoretical point of view, the problem of secure MPC is strongly connected to the problem of masking a cryptographic implementation. This observation has been made in [45, 46]. The intuition behind is that both masking and MPC aim to perform computations on shared data. In more detail, the common strategy behind these techniques is to combine random and unknown masks with a shared secret value, and to perform operations on these masked values. Only at the end of the computation, the values are unmasked by combining them, in a manner that is defined by the masking scheme. In our scheme, we use a linear sharing scheme, because affine operations (e.g., additions, or multiplications with a constant) are non-interactive and resource efficient, unlike the multiplications that require some communication between the parties. The number of multiplications required to perform a computation is a good estimate of the complexity of an MPC protocol.

However, in practice, other factors influence the efficiency of a design. For instance, while one multiplication requires one round of communication, a batch of multiplications can be processed into a single round in many cases. In that regard, Ciminion makes it possible to batch several multiplications due to the parallel execution of \(p_E\). Another alternative to speed up the processing of messages is to execute some communication rounds in an offline/pre-computation phase before receiving the input to the computation. This offline phase is cheaper than the online rounds. For example, in the case of Ciminion, precomputing several intermediate states is possible by applying \(p_C\) to different nonces \(\aleph \). As a result, for the encryption of arriving messages, those intermediate states only have to be expanded, and processed by \(p_E\) to encrypt the plaintext.

Section 5 demonstrates that our design Ciminion has a lower number of multiplications compared to several other schemes working over larger fields. The comparison of the number of multiplications in MPC applications to the ciphers that are presented in the literature, is shown in Fig. 2, when working over a field \({ (\mathbb {F}_{p})^{t} }\) with \(p\approx 2^{128}\) and \(t\ge 1\), and with a security level of 128 bits (which the most common case in the literature). It indicates that our design needs approximately \(t + 14 \cdot \lceil t/2\rceil \approx 8 \cdot t\) multiplications compared to \(12 \cdot t\) multiplications that are required by HadesMiMC, or \(60\cdot t\) multiplications that is needed by Rescue. These two schemes that have recently been proposed in the literature are our main competitors. Additionally, our design employs a low number of linear operations when compared with other designs present in the literature. Indeed, Ciminion grows linearly w.r.t. t, whereas the number of linear operations grows quadratically in HadesMiMC and Rescue. That is because their rounds are instantiated via the multiplication with a \(t\times t\) MDS matrix. Even if the cost of a linear operation is considerably lower than the cost of a non-linear one in MPC applications, it is desirable to keep both numbers as low as possible. Our design has this advantage.

2 Specification

2.1 Mode

In order to create a nonce-based stream-encryption scheme, we propose to work with the mode of operation described in Fig. 3. First, the scheme takes a nonce \(\aleph \) along with two subkey elements \(K_1\) and \(K_2\) as input, and processes these input with a permutation \(p_C\) to output an intermediate state. This intermediate state is then processed by a permutation \(p_E\), and truncated to two elements so that two plaintext elements \(P_1\) and \(P_2\) can be encrypted. If more elements need to be encrypted, the intermediate state can be expanded by repeatedly performing an addition of two subkey elements to the intermediate state, then followed by a call to the rolling function \( rol \). After each call to the rolling function \( rol \), two more plaintext elements \(P_{2i}\) and \(P_{2i+1}\) can be encrypted thanks to the application of \(p_E\) to the resulting state. We consider the field elements as atomic, and therefore, our mode can cope with a different number of elements without the need for padding. The algorithmic description of the mode of operation that is described in Fig. 3, is provided in [34, App. A. I].

Fig. 3.
figure 3

Encryption with Ciminion over \({ \mathbb {F}_{2^n} }\). The construction is similar over \({ \mathbb {F}_{p} }\) (\(\oplus \) is replaced by \(+\), the addition modulo p).

2.2 Permutations

We describe two permutations of the vector space \(\mathbb {F}_{q}^3\). They act on a state of triples \((a, b, c) \in \mathbb {F}_{q}^3\). The first permutation is defined for a prime number \(q = p\) of \(\log _2(p)\approx n\) bits, while the second permutation is specified for \(q=2^n\). Both permutations are the result of the repeated application of a round function. Their only difference is the number of repeated applications that we call rounds. As presented in Fig. 3, we employ two permutations \(p_C\) and \(p_E\) that have respectively N and R rounds.

Round Function. We write \(f_i\) for round i (Fig. 4). It uses four round constants \( RC _\ell \), with \(\ell = i\) for \(p_C\), and \(\ell = i + N - R\) for \(p_E\). We assume that \( RC4 _\ell \notin \{0, 1\}\). For each \(i \ge 1\), \(f_i\) maps a state \((a_{i-1},b_{i-1},c_{i-1})\) at its input to the state \((a_{i},b_{i},c_{i})\) at its output, where the relation between these two states is

$$ \begin{bmatrix} a_{i} \\ b_{i} \\ c_{i} \end{bmatrix} := \begin{bmatrix} 0&{}0&{}1 \\ 1&{} RC4 _{\ell }&{} RC4 _{\ell }\\ 0&{}1&{}1\\ \end{bmatrix} \cdot \begin{bmatrix} a_{i-1} \\ b_{i-1} \\ c_{i-1} + a_{i-1}\cdot b_{i-1} \end{bmatrix} + \begin{bmatrix} RC3 _{\ell } \\ RC1 _{\ell } \\ RC2 _{\ell } \end{bmatrix} \, . $$

2.3 The Rolling Function

Our rolling function \( rol \) is a simple NLFSR as depicted in Fig. 5. The rolling function takes three field elements \(\iota _a\), \(\iota _b\), and \(\iota _c\) at the input. It outputs three field elements: \(\omega _a := \iota _c + \iota _a \cdot \iota _b\), \(\omega _b := \iota _a\), and \(\omega _c := \iota _b\). The latter variables form the input of the permutation \(p_E\) in our Farfalle-like mode Fig. 3.

Fig. 4.
figure 4

Round function \(f_i\).

Fig. 5.
figure 5

\( rol \).

Fig. 6.
figure 6

Key generation.

2.4 SubKeys and Round Constants

SubKeys Generation. We derive the SubKey material \(K_i\) from two master keys \(MK_1\), and \(MK_2\). As a result, the secret is shared in a compact manner, while the expanded key is usually stored on a device, and used when needed. To expand the key, we use the sponge construction [13] instantiated with the permutation \(p_C\) (Fig. 6). The value \( IV _H\) can be made publicly available, and is typically set to one.

Round Constants Generation. We generate the round constants \( RC1 _\ell \), \( RC2 _\ell \), \( RC3 _\ell \), and \( RC4 _\ell \) with Shake-256 [14, 55]. The detail is provided in [34, App. A].

2.5 Number of Rounds and Security Claim for Encryption

In this paper, we assume throughout that the security level of s bits satisfies the condition \(64 \le s \le \lfloor \log _2(q) \rfloor \). This implies that \(q\ge 2^{64}\).

In Table 1, we define three sets of round numbers for each permutation in our encryption scheme:

  • The “standard” set guarantees s bit of security; in the following sections, we present our security analysis that supports the chosen number of rounds for this case.

  • For our MPC application, we propose a number of rounds if the data available to the attacker is limited to \(2^{s/2}\); our security analysis that supports the chosen number of rounds for this case is presented in [34, App. F].

  • Finally, we present a “conservative” number of rounds where we arbitrarily decided to increase the number of rounds by \(50\%\) of the standard instance.

Table 1. Proposed number of rounds based on f. The security level s must satisfy \(64 \le s \le \log _2(q)\), and \(q\ge 2^{64}\), where q is the number of elements in the field.

Since many cryptanalytic attacks become more difficult with an increased number of rounds, we encourage to study reduced-round variants of our design to facilitate third-party cryptanalysis, and to estimate the security margin. For this reason, it is possible to specify toy versions of our cipher, i.e., with \(q < 2^{64}\) which aim at achieving, for example, only 32 bits of security.

3 Design Rationale

3.1 Mode of Operation

In order to provide encryption, our first design choice is to choose between a mode of operation that is built upon a block cipher or a cryptographic permutation. In either case, a datapath design is necessary. However, a block cipher requires an additional key schedule, unlike a cryptographic permutation. If a designer opts for a block cipher, the key schedule can be chosen to be either a non-linear, an affine, or a trivial transformation, where the round keys are equal to the master key apart from round constants. In this case, the designer has to be careful, because a poor key schedule leads to weaknesses and attacks [19]. Considering that the research in low multiplicative complexity ciphers is a relatively new research area, we decided to limit our focus to the essential components of a primitive. Therefore, we opted for permutation-based cryptography.

Fig. 7.
figure 7

Intermediate step in constructing Fig. 3

Since we consider the application of low multiplicative ciphers in areas that have enough resources to profit from parallel processing, we base our mode of operation on the Farfalle construction [12] as depicted in Fig. 1a. The Farfalle construction is a highly versatile construction that provides many functionalities.

A Modified Version of Farfalle. As already mentioned in the introduction, our mode of operation resembles the Farfalle construction. In this section, we explain and support the modifications that we performed on the original Farfalle construction, as depicted in Fig. 1a. The aim of those modifications is to both increase the resistance of the construction against algebraic attacks which are the most competitive ones in our scenario, and to increase its efficiency in our target application scenario, that is to say to minimize the number of multiplications. We focus first on the security aspect, before explaining in further detail how we reach our efficiency goal.

Our first modification is for simplicity. Since the functionality provided by the Farfalle construction to compress information is not needed, we merge \(p_c\) and \(p_d\) to a single permutation \(p_C\).

Our second modification is to truncate the output. This prevents meet-in-the-middle style attacks that require the knowledge of the full output.

The third modification is to manipulate different keys \(K_i\) (see Fig. 7) instead of employing the same key \(k'\) for each output block. Since we aim to have a permutation with a low degree, Gröbner bases are the main threat. For the scheme that is depicted in Fig. 7, an attacker has to exploit equations of the form \(f(x) + K_i = y\) and \(f(x^\prime ) + K_i = y^\prime \), with \(f(x) - f(x^\prime ) = y - y^\prime \) for a Gröbner basis attack. We describe this scenario in more detail in Sect. 4.4.

Our last modification is to move the keys \(K_i\) from the output of \(p_E\) to the input of our rolling function, and hence, effectively to the input of \(p_E\) (Fig. 3). Figure 3 is our final construction, and it provides two main benefits. First, having the keys at the input does not make it possible to easily cancel them by computing the difference of the output as described before. Hence, this adds an additional barrier in mounting successful Gröbner basis attacks. Second, we can use a simple non-linear rolling function, because the addition of the key stream during the rolling function prevents the attacker from easily detecting short cycles within it.

Minimizing the Number of Multiplications. One main reason to use the Farfalle construction is that its three permutations \(p_c\), \(p_d\), and \(p_e\) do not have to provide protection against all possible attack vectors. Indeed, the permutation \(p_e\) alone does not have to provide resistance against higher-order differential attacks [48, 50]. The latter are particular algebraic attacks that exploit the low degree polynomial descriptions of the scheme. Resistance against higher-order differential attacks (higher-order attacks in short) can be provided by the permutations \(p_c\), and \(p_d\), and it inherently depends on the algebraic degree that a permutation achieves. Hence, requiring protection against higher-order attacks provides a lower bound on the number of multiplications that are needed in a permutation. In a nutshell, since \(p_e\) does not have to be secure against higher-order attacks, we can use a permutation with fewer multiplications. This benefits the multiplication count of the scheme, since the permutations \(p_c\) and \(p_d\) are called only once independently of the number of output words.

The Rolling Function. An integral part of the Farfalle construction is the rolling function \( rol \). The permutations \(p_c\) and \(p_e\) (Fig. 1a) in the Farfalle construction are usually chosen to be very lightweight, such that the algebraic degree is relatively low. Hence, to prevent higher-order attacks, the rolling function is chosen to be non-linear. In our modified version, the same is true up to the intermediate construction as depicted in Fig. 7. In this case, \( rol \) has to be non-linear in order to use a permutation \(p_E\) of low degree. For our final construction (Fig. 3), we do not see any straightforward way to exploit higher-order attacks due to the unknown keys at the inputs of \(p_E\). Thus, we could use a linear rolling function \( rol \), but we rather choose to use a simple non-linear \( rol \) for Ciminion. That is because it makes it possible to analyze the security of Fig. 7, and to keep the same conclusion when we opt for the stronger version of Fig. 3. In addition, we present Aiminion in [34, App. B], a version of our design that does not follow this line of reasoning. Aiminion uses a linear rolling function, and nine rounds of \(p_E\). We deem this version to be an interesting target for further analysis that aims to evaluate the security impact of switching from a non-linear to a linear rolling function.

Generating the Subkeys. Instead of sharing all subkeys \(K_i\) directly by communicating parties to encrypt messages, we specify a derivation of the subkeys \(K_i\) from two master keys \(\textit{MK}_1\), and \(\textit{MK}_2\). These subkeys can be generated in a single precomputation step. For the storage of the subkeys, trade-offs can be made to store as many subkeys as needed, and to split messages into lengths that match the stored subkey lengths.

3.2 The Round Function

Our round function is composed of three layers: a non-linear transformation, a linear transformation, and a round constant addition. Like classical designs, we employ the same non-linear and linear transformations for each round, but with different round constant additions. This makes it easier to implement, and to reduce code-size and area requirements. Nonetheless, some primitives that have been designed to lower the multiplicative complexity use a different linear layer for each round, like in LowMC [4].

Non-linear Transformation. Most primitives operating in large fields have a variant of powering field elements, e.g., \(x^3\) or \(x^{-1}\). These map**s became popular to guard against linear and differential cryptanalysis due to their properties [56]. The most popular design that uses such map**s is the AES [28], where \(x^{-1}\) is used as part of its S-box. For ciphers that aim at a low multiplicative complexity, these power map**s are interesting because they often have an inverse of high degree, which provides protection against algebraic attacks. However, they impose some restrictions, e.g., the map \(x\mapsto x^\alpha \) for integer \(\alpha \ge 2\) is a bijection in \( \mathbb {F}_{q} \) if and only if \(\gcd (q-1, \alpha ) = 1\) (e.g., \(x\mapsto x^3\) is a permutation over \( \mathbb {F}_{2^n} \) for odd n only). Hence, one has to consider several power values \(\alpha \) in order for \(x^\alpha \) to stay a permutation for any field. In a design that should make it possible to be instantiated for a wide variety of fields, considering those special cases complicates the design of the cipher.

Instead of a power map**, the non-linear element in our designs is the Toffoli gate [62]. Indeed, algebraic attacks are the main threat against designs aiming to lower the multiplicative complexity, and the multiplications are the main cost factor in our design. It thus seems counter intuitive to spend the non-linear element on simply manipulating a single field element, as is the case for power map**s. Therefore, we choose to multiply two elements of the state, instead of operating on a single state element, in order to increase the non-linear diffusion. Furthermore, the Toffoli gate is a permutation for any field, and therefore we are not restricted to a specific field. We mitigate potential negative effects of the property of the Toffoli gate to provide the same degree in forward and backward direction by mandating its use only in modes that truncate the permutation output, and that never evaluate its inverse using the secret key.

Linear Transformation. We present the linear transformation in its matrix form, the coefficients of which must be carefully chosen. One possibility is to use an MDS matrix. Since an MDS matrix has the highest branch number [24] among all possible matrices, it plays an important role in proving lower bounds on the linear and differential trail weight. However, we do not need to rely on MDS matrices as the field multiplications already have advantageous properties against linear and differential attacks.

Another option is to randomly choose the coefficients of the matrix for each round, and then verify that the matrix is invertible. This strategy was used in one of the first low multiplicative complexity designs, namely LowMC [4]. However, the drawback is that random matrices contribute significantly to the cost of the primitive in some scenarios, and the security analysis becomes more involved. Hence, we have decided to use a much simpler linear layer.

In order to provide sufficient diffusion, complex equation systems, and low multiplicative complexity, the degree of the functions that output equations depending on the input variables must grow as fast as possible. By applying a single multiplication per round, the degree doubles per round in the best scenario. However, this also depends on the linear layer. For instance, this layer could be a simple layer permuting the elements (e.g., the \(3\,\times \,3\) circulant matrix circ(0, 0, 1)), for which the univariate degree of a single element only grows according to a Fibonacci sequence. To ensure that the univariate degree of a single element doubles per round, the result of the previous multiplication has to be reused in the multiplication of the next round. This is also applicable to the inverse of the permutation. Hence, we decided to use the following matrix for the linear layer:

$$\begin{aligned} M= \begin{bmatrix} 0 &{} 0 &{} 1 \\ 1 &{} RC4 &{} RC4 \\ 0 &{} 1 &{} 1\\ \end{bmatrix}\, \qquad \text { (and } \qquad M^{-1}= \begin{bmatrix} 0 &{} 1 &{} - RC4 \\ -1 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0\\ \end{bmatrix}\,), \end{aligned}$$

Here, \(M_{0,2}, M_{1,2}, {M^{-1}}_{0,2}, {M^{-1}}_{1,2} \ne 0\) with \(M_{i,j}\) denoting the element of the matrix M at row i and column j. The use of the round constant \( RC4 \notin \{0, 1\}\) is motivated by aiming to improve the diffusion, and to avoid a weakness with respect to linear cryptanalysis that we discuss in Sect. 4.1.

About Quadratic Functions. In addition to the matrix multiplication, another (semi-)linear transformationFootnote 1 over a binary field \( \mathbb {F}_{2^n} \) is the quadratic permutation \(x\mapsto x^2\). This transformation can be exploited as a component in the round function (e.g., as a replacement of the multiplication by \( RC4 \)) to both increase the diffusion and the overall degree of the function that describes the scheme. However, we do not employ it for several reasons. First, even if the quadratic permutation is linear over \( \mathbb {F}_{2^n} \), its cost in an application like MPC might not be negligible. Indeed, the quadratic permutation costs one multiplication as detailed in [43]. As a result, even if it makes it possible to reduce the overall number of rounds due to a faster growth of the degree, the overall number of multiplicationsFootnote 2 would not change for applications like MPC. Secondly, the quadratic function is not a permutation over \( \mathbb {F}_{p} \) for a prime \(p \ne 2\). Thus, its introduction implies having to work with two different round functions: one for the binary case and one for the prime case. Since our goal is to present a simple and elegant general scheme, we decided not to use it.

Round Constants. The round constants break up the symmetry in the design. They prevent the simplification of the algebraic description of the round function. However, as we manipulate many round constants, and since they influence the rounds in a complex manner, we use an extendable output function to obtain round constant values without an obvious structure. We performed some experiments where we added round constants to one or two state elements. These instances provided simpler algebraic descriptions. Considering the small costs of manipulating dense round constants, we decide to use three round constants to complicate the algebraic description of the cipher, even after a few rounds.

4 Security Analysis

We present our security analysis of Ciminion with respect to “standard” application of the attacks that are found in the literature. This analysis determines the required number of rounds to provide some level of confidence in its security. Due to page limitation, further analysis is presented in the full version of the paper.

First and foremost, the number of rounds that guarantees security up to s bits are computed under the assumption that the data available to the attacker is limited to \(2^s\), except if specified in a different way. Moreover, we do not make any claim about the security against related-key attacks and known- or chosen-key distinguishers (including the zero-sum partitions). The latter are out the scope of this paper.

We observe that the attack vectors penetrating the highest number of rounds are algebraic attacks. On the contrary, traditional attacks, such as differential and linear cryptanalysis, are infeasible after a small number of rounds. As detailed in the following, in order to protect against algebraic attacks and higher-order differential attacks, we increase the number of rounds proportionally to the security level s. A constant number of rounds is added to prevent an adversary from guessing part of the key or the initial or middle state, or to linearize part of the state. Hence, the numbers of rounds for \(p_C\) and \(p_E\) are respectively \(s+6\) and \(\left\lceil \frac{s + 19}{12} + 1.5\right\rceil \) for the standard security level.

4.1 Linear Cryptanalysis

Linear cryptanalysis [52] is a known-plaintext attack that abuses high correlations [25] between sums of input bits and sums of output bits of a cryptographic primitive. However, classical correlation analysis is not restricted to solely primitives operating on elements of binary fields. In this section, we apply the existing theory developed by Baignères et al. [9] for correlation analysis of primitives that operate on elements of arbitrary sets to the permutations defined in Sect. 2.

General Correlation Analysis. An application of the theory to ciphers operating on elements of binary fields is presented by Daemen and Rijmen [29]. In this section, we apply the theory to the more general case of primitives operating on elements of \({ \mathbb {F}_{q} }\) where \(q = p^n\). Henceforth, we suppose that \(f : { (\mathbb {F}_{q})^{l} } \rightarrow { (\mathbb {F}_{q})^{m} }\).

Correlation analysis is the study of characters, and their configuration in the l-dimensional vector space \(L^2({ (\mathbb {F}_{q})^{l} })\) of complex-valued functions \({ (\mathbb {F}_{q})^{l} } \rightarrow \mathbb {C}\). The space \(L^2({ (\mathbb {F}_{q})^{l} })\) comes with the inner product \(\langle g, h \rangle = \sum _{} g(x)\overline{h(x)}\), which defines the norm \(\Vert g \Vert = \sqrt{\langle g,g \rangle } = q^\frac{l}{2}\).

A character is an additive homomorphism from \({ (\mathbb {F}_{q})^{l} }\) into \(S := \{ z \in \mathbb {C} : |z| = 1\}\). It is well-known that any character on \({ (\mathbb {F}_{q})^{l} }\) is of the form

$$\begin{aligned} \chi _u(x) = e^{\frac{2\pi i}{p} {{\,\mathrm{Tr}\,}}^q_p(u^{\top }x)} \,, \end{aligned}$$

for some \(u \in { (\mathbb {F}_{q})^{l} }\). We recall that for \(q = 2\) we have that \(\chi _u(x) = (-1)^{u^{\top }x}\), which appears in classical correlation analysis. Here, \({{\,\mathrm{Tr}\,}}^q_p(x) = x + x^2 + \cdots + x^{p^{l-1}} \in { \mathbb {F}_{p} }\) is the trace function. For this reason, \(u^{\top }x\) is called a vectorial trace parity and u a trace mask vector. We call the ordered pair (uv) a linear approximation of f, where u is understood to be the mask at the input and v to be the mask at the output of f.

We define the vectorial trace parity correlation in the following definition.

Definition 1 (Correlation)

$$\begin{aligned} {{\,\mathrm{C}\,}}_f(u,v) = \frac{\langle \mu _u, \mu _v \circ f \rangle }{\Vert \mu _u \Vert \Vert \mu _v \circ f \Vert } = \frac{1}{q^l}\sum \limits _{x \in { (\mathbb {F}_{q})^{l} }} e^{\frac{2\pi i}{p} {{\,\mathrm{Tr}\,}}^q_p(u^{\top }x - v^{\top }f(x))} \end{aligned}$$

This helps us to define a more general linear probability metric as follows.

Definition 2 (Linear probability)

\({{\,\mathrm{LP}\,}}_f(u,v) = |{{\,\mathrm{C}\,}}_f(u,v)|^2\)

The idea is then to consider the permutation as a circuit made of simple building blocks. Those blocks correspond to the operators that we apply, and for which we attach to each edge a trace mask vector. Importantly, these trace mask vectors are in one-to-one correspondence with characters. The goal of the attacker is to construct a linear trail from the end of the permutation to the beginning, with the goal of maximizing the linear probability of each building block. A list of the linear probabilities of each such building block can be found in [34, App. C.2] to deduce the result of the analysis.

Fig. 8.
figure 8

Mask propagation in f

On Three-Round Linear Trails. Figure 8 illustrates how the linear masks propagate through the round function when the linear probabilities of all building blocks are maximized. In this Figure, \(c_\ell := RC4 _\ell \). The attacker is able to choose uv,  and w freely at the beginning of the first round, and afterwards, a mask at the input of the next round is determined by a mask at the output of the former round. We write \(R_i\) for the i’th round function. Moreover, we use the notation \(c_{ij} := c_ic_j\) and \(c_{ijk} := c_ic_jc_k\), where the subscript refers to the round number. The masks evolve as follows:

$$\begin{aligned}&\begin{pmatrix} u\\ v\\ w \end{pmatrix} \xrightarrow {R_0} \begin{pmatrix} v\\ c_1v+w \\ u+c_1v+w \end{pmatrix} \xrightarrow {R_1} \begin{pmatrix} c_1v+w\\ u + (c_1+c_{12})v + (1+c_2)w\\ u+(1+c_1+c_{12})v+(1+c_2)w \end{pmatrix} \\&\xrightarrow {R_2} \begin{pmatrix} u+(c_1+c_{12})v+(1+c_2)w\\ (1+c_3)u + (1+c_1+c_{12}+c_{13}+c_{123})v + (1+c_2+c_3+c_{23})w\\ (1+c_3)u+(1+2c_1+c_{13}+c_{12}+c_{123})v+(2+c_2 + c_3+c_{23})w \end{pmatrix} \,. \end{aligned}$$

An implicit assumption in both Fig. 8, and the mask derivation above, is that the masks at the output of the multiplication and at the input of the third branch are equal. However, an attacker can only make sure that this assumption is valid if the following system of equations has a non-zero solution:

$$\begin{aligned} \begin{pmatrix} 1 &{} c_1 &{} 1 \\ 1 &{} 1+c_1+c_{12} &{} 1+c_2 \\ 1+c_3 &{} 1+2c_1+c_{13}+c_{12}+c_{123} &{} 2+c_2 + c_3+c_{23} \end{pmatrix} \begin{pmatrix} u \\ v \\ w \end{pmatrix}&= \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \,. \end{aligned}$$

If we denote by A the matrix above, then this happens if and only if the matrix is singular, i.e., if \(\det (A) = c_2c_3+1=0\). If either \(c_2\) or \(c_3\) is equal to zero, then the condition does not hold. If both are non-zero, then the condition is equivalent to requiring that \(c_2 = -c_3^{-1}\). In this case, we can freely choose one value, which determines the other. Hence, the probability that the condition holds is equal to \(\frac{q-1}{q^2} < \frac{1}{q}\). Since \(\log _2(q)\) is the security parameter, this probability is negligible and there exists no three-round trail with a linear probability of 1.

Clustering of Linear Trails. We have \({{\,\mathrm{LP}\,}}_f(u, v) \ge \sum _{Q \in {{\,\mathrm{LT}\,}}_f(u,v)} {{\,\mathrm{LP}\,}}(Q)\), where \({{\,\mathrm{LT}\,}}_f(u,v)\) is the set of linear trails contained in (uv). If we suppose now that an attacker is able to find more than q linear trails, i.e., if \(\left| {{\,\mathrm{LT}\,}}_f(u,v) \right| > q\), then we have \({{\,\mathrm{LP}\,}}_f(u,v) > \frac{1}{q}\). However, \(\log _2(q)\) is the security parameter, therefore the latter condition is not feasible. In a nutshell, three rounds are sufficient to resist against linear cryptanalysis.

Round Constant Multiplication Necessity. If the multiplication by the round constant is not present, or \( RC4 _\ell = 1\), then the masks evolve as follows over a single round:

$$\begin{aligned} \begin{pmatrix} u\\ v\\ w \end{pmatrix}&\xrightarrow {f^{-1}} \begin{pmatrix} v+x\\ v+w+y \\ u+v+w \end{pmatrix} \quad \xrightarrow {\text {if } u = v \text { and } x = y = w = 0} \qquad \begin{pmatrix} v\\ v\\ 0 \end{pmatrix}&\xrightarrow {f^{-1}} \begin{pmatrix} v\\ v \\ 2v \end{pmatrix} \,, \end{aligned}$$

where (xy) is the mask vector at the input of the multiplication function, which, like uv,  and w, can be freely chosen. Hence, if we choose \(u = v\), and \(x = y = w = 0\), and since the characteristic of the field is equal to two, then a one-round approximation with a linear probability of one can be chained indefinitely. This is the reason behind including a multiplication by a non-trivial constant.

4.2 Differential Cryptanalysis

Differential cryptanalysis exploits the probability distribution of a non-zero input difference leading to an output difference after a given number of rounds [17]. As Ciminion is an iterated cipher, a cryptanalyst searches for ordered sequences of differences over r rounds that are called differential characteristics/trails. A differential trail has a Differential Probability (DP). Assuming the independence of the rounds, the DP of a differential trail is the product of the DPs of its one-round differences (Definition 3).

Definition 3 (One-round differential probability)

Let \((\alpha _a,\alpha _b, \alpha _c) \in { (\mathbb {F}_{p})^{3} }\) be the input of the round, and \((\alpha ^{*}_a,\alpha ^{*}_b, \alpha ^{*}_c) \in { (\mathbb {F}_{p})^{3} }\) the chosen non-zero input difference. The probability that an input difference is mapped to an output difference \((\beta ^{*}_a, \beta ^{*}_b, \beta ^{*}_c) \in { (\mathbb {F}_{p})^{3} }\) through one iteration of the round function f is equal to

$$\begin{aligned}&\frac{| f(\alpha ^{*}_a +\alpha _a, \alpha ^{*}_b + \alpha _b, \alpha ^{*}_c + \alpha _c) - f(\alpha _a ,\alpha _b, \alpha _c) = ( \beta ^{*}_a, \beta ^{*}_b, \beta ^{*}_c) |}{|{ (\mathbb {F}_{p})^{3} }|} \,. \end{aligned}$$

The operation \(+\) is replaced by \(\oplus \) in \( \mathbb {F}_{2^n} \).

However, in general, the attacker does not have any information about the intermediate differences of the differential trail. Hence, the attacker only fixes the input and the output differences over r rounds, and works with differentials. A differential is a collection of differential trails with fixed input and output differences, and free intermediate differences. The DP of a differential over r rounds is the sum of all DPs of the differential trails that have the same input and output difference over the same number of rounds as the differential.

In this paper, we perform the differential cryptanalysis by grou** fixed differences in sets. Those sets impose some conditions to satisfy between the differences of the branches of the round, and/or specify that some differences at the input of the branches equal zero. Then, given an input difference, we study the possible sets of output differences after a round, and we determine the DP that an input difference is mapped into an output difference over a round. The goal is to find the longest differential trail with the highest DP.

Toward this end, we build a state finite machine (more details in [34, App. C.3]) that represents all the encountered sets of differences as states associated to their differential probabilities. To construct the graph, we start with a difference of the form \(\{(0, 0, x) | x\ne 0 \}\), and we search for the possible sets of output differences until we have explored all the possibilities from each newly reached set. Hereafter, let us assume that the difference x is not zero. We see that an input difference from \(\{(0, 0, x)\}\) is mapped into an output difference of the form \(\{(x, RC4 _\ell x, x)\}\) after one round with probability one. Indeed, since the input difference goes through the non-linear operation and stays unchanged, the output difference is simply the result of the linear operation applied to the input difference. For the other cases, a non-zero input difference propagates to an output difference over one round with probability equal to \(p^{-1}\) in \( \mathbb {F}_{p} \), or \(2^{-n}\) in \( \mathbb {F}_{2^n} \). From those results, we determine the differential over three rounds with the highest DP.

On Three-Round Differentials. The differential trail in \( \mathbb {F}_{p} \) with the highest DP is

$$\begin{aligned} \{(0, 0, x)\} \xrightarrow {\text {prob. 1}} \{(x, RC4 _\ell x, x)\} \xrightarrow {\text {prob.}\,p^{-1}} \{(- RC4 _\ell x, x, 0) \} \xrightarrow {\text {prob.}\, p^{-1}} \{(0, 0, x)\} \,, \end{aligned}$$

where the fixed input difference x is equal to another fixed value in the following rounds, and satisfies the conditions imposed by the set (for details see [34, App. C.3]). Additionally, this differential trail holds if and only if the round constant \( RC4 _\ell \) introduced by the first round is equal to the round constant \( RC4 _\ell \) of the third round.

In \( \mathbb {F}_{2^n} \), we obtain almost the same state finite machine as in \( \mathbb {F}_{p} \). The only exception is that the set of differences \(\{(- RC4 _\ell x, x, 0)\}\) corresponds to \(\{( RC4 _\ell x, x, 0)\}\), because \(-z\) is equal to z for each \(z\in \) \( \mathbb {F}_{2^n} \). Hence, the differential trail in \( \mathbb {F}_{2^n} \) with the highest DP is

$$\begin{aligned} \{(0, 0, x)\} \xrightarrow {\text {prob. 1}} \{(x, RC4 _\ell x, x) \} \xrightarrow {\text {prob.}\,2^{-n}} \{ ( RC4 _\ell x, x, 0) \} \xrightarrow {\text {prob.}\,2^{-n}} \{(0, 0, x) \}\,, \end{aligned}$$

under the same conditions that in \( \mathbb {F}_{p} \).

In summary, a fixed difference from \(\{(0, 0, x)\}\) is mapped to the difference of the form \(\{(x, RC4 _\ell x, x)\}\) after one round with probability one in \( \mathbb {F}_{2^n} \) and in \( \mathbb {F}_{p} \). Moreover an input difference can be mapped to an output difference of the form \(\{(0, 0, x)\}\) with DP \(p^{-1}\) (resp. \(2^{-n}\)) if and only if this difference is of the form \(\{(- RC4 _\ell x, x, 0)\}\). This means that the only possible differential trail over three rounds with input and output differences of the form \(\{(0,0,x)\}\) are the ones given before. The DP of this differential trail is expressed in the following Lemma.

Lemma 1

A differential trail over three rounds has a probability at most equal to \(p^{-2}\) in \( \mathbb {F}_{p} \) and \(2^{-2n}\) in \( \mathbb {F}_{2^n} \).

The DP of all other differential trails over three round are at most equal to \(p^{-3}\) in \( \mathbb {F}_{p} \) and \(2^{-3n}\) in \( \mathbb {F}_{2^n} \). Since the security level s satisfies \(s\le \log _2(p)\) in \( \mathbb {F}_{p} \) and \(s\le n\) in \( \mathbb {F}_{2^n} \), we therefore conjecture that three rounds are sufficient to guarantee security against “basic” differential distinguishers. We thus choose to have at least six rounds for the permutations \(p_E\) and \(p_C\), which is twice the number of rounds necessary to guarantee security against “basic” differential/linear distinguishers. The minimal number of rounds for the permutations should provide security against more advanced statistical distinguishers.

4.3 Higher-Order Differential and Interpolation Attacks

If a cryptographic scheme has a simple algebraic representation, higher-order attacks [48, 50] and interpolation attack [47] have to be considered. In this part, we only focus on higher-order differential attacks. We conjecture that the number of rounds necessary to prevent higher-order differential attacks is also sufficient to prevent interpolation attacks (see details in [34, App. D]). This result is not novel, and the same applies for other schemes, like MiMC, as further explained in [36].

Background. We recall from Fig. 3 that an attacker can only directly manipulate a single element, and the two other elements are the secret subkeys. We therefore operate with this single element to input value sets, while kee** the two other elements fixed. Each output element is the result of a non-linear function depending on the input element x, and two fixed elements that are the input of the permutation. Thus, we have \(f_N(x) = p(x, const , const )\) in \( \mathbb {F}_{2^n} \), and \(f_p(x)=p(x, const , const )\) in \( \mathbb {F}_{p} \).

A given function \(f_p\) over prime fields \( \mathbb {F}_{p} \) is represented by \(f_p(x) = \sum _{i=0}^{p-1} \kappa _i x^i \) with constants \(\kappa _i \in { \mathbb {F}_{p} }\). The degree of the function \(f_p(x)\) that we denote by \(d_{ \mathbb {F}_{p} }\), corresponds to the highest value i for which \(\kappa _i \ne 0\). The same holds for a function \(f_n\) working over binary extension fields \(\mathbb F_{2^n}\). For the latter, \(f_N(x) = \bigoplus _{i=0}^{d} \kappa _i x^i\) with \(\kappa _i \in { \mathbb {F}_{2^n} }\), and \(d_{ \mathbb {F}_{2^n} }\) is the degree of the function \(f_n(x)\). Like previously, the degree is the highest value i for which \(\kappa _i \ne 0\). In \( \mathbb {F}_{2^n} \), the function can as well be represented by its algebraic norm form (ANF) \(\overrightarrow{f_n}(x_1,\dots ,x_n)\), whose output element j is defined by its coordinate function \(f_{n,j}(x_1,\dots ,x_n) = \bigoplus _{u = (u_1, \dots , u_2)} \kappa _{j,u} \cdot x_1^{u_1} \cdot \ldots \cdot x_n^{u_n}\) with \(\kappa _{j,u} \in { \mathbb {F}_{2} }\). The degree \( d_{\mathbb {F}_2^n }\)of \( \overrightarrow{f_n}\) corresponds to the maximal Hamming weight of u for which \(\kappa _{j,u} \ne 0\), that is to say \({ d_{\mathbb {F}_2^n }}= \max _{i\le d} \{ hw(i) \, | \, \kappa _i \ne 0\}\).

For the last representation, as proved by Lai [50] and in[48], if we iterate over a vector space \(\mathcal V\) having a dimension strictly higher than \( d_{\mathbb {F}_2^n }\), we obtain the following result: \(\bigoplus _{v\in \mathcal V \oplus \nu }f_n(v) = 0\,.\) A similar result has also been recently presented for the prime case in [36, Proposition 2]. More precisely, if the degree of \(f_p(x)\) is \(d_{{ \mathbb {F}_{p} }}\), then iterating over all elements of a multiplicative subgroup \(\mathcal {G}\) of \(\mathbb F_{p}^t\) of size \(|\mathcal {G}| > d_{{ \mathbb {F}_{p} }}\) leads to \(\sum _{x\in \mathcal {G}}f_p(x) = f_p(0)\cdot |\mathcal {G}|\, \). The last sum is equal to zero modulo p since \(|\mathcal G|\) is a multiple of p.

In order to provide security against higher-order differential attacks based on the presented zero-sums, we choose the number of rounds of our permutation to have a function of a degree higher than our security claim.

Overview of our Security Argument. In our construction, we assume that an attacker can choose the nonce \(\aleph \), which is the input of the permutation \(p_C\). For the first call of this permutation, we want to prevent an attacker to input value sets that always result in the same constant after the application of the permutation \(p_C\). This requirement is necessary, since we assume in the remaining analysis that the output values of \(p_C\) are unpredictable by an attacker. We emphasize that if the output of the permutation \(p_C\) is guaranteed to be randomly distributed, then this is sufficient to prevent higher-order differential attacks. That is because the inverse of the final permutations \(p_E\) is never evaluated, and the attacker cannot construct an affine subspace in the middle of the construction.

Estimating the Degree of \(p_C\): Necessary Number of Rounds. We study the evolution of the degrees \(d_{ \mathbb {F}_{p} }\) and \(d_{ \mathbb {F}_{2^n} }\) for the permutation \(p_C\) for which the round function f (Fig. 3) is iterated r times. We conclude that the degree of the permutation \(p_C\) remains unchanged for two rounds, if an input element is present at branch a, and the input at the branch b is zero. For a higher number of rounds, the degree increases. We have chosen the affine layer to ensure that the output of the multiplication can affect both inputs of the multiplication in the next round. This should make it possible for the maximal possible degree of the output functions to increase faster than having affine layers without this property. In the best case, the maximal degree of the function can be doubled per round.

Considering both previous observations, a minimum of \(s+2\) rounds are required to obtain at least \(d_{ \mathbb {F}_{p} }\approx 2^{s}\), or \(d_{ \mathbb {F}_{2^n} }\approx 2^{s}\). As we want to ensure that the polynomial representation of \(p_C\) is dense, it is then advisable to add more rounds as a safety margin. In order to reach this goal, we arbitrarily decided to add four more rounds.

4.4 Gröbner Basis Attacks

Preliminary. To perform a Gröbner basis [21] attack, the adversary constructs a system of algebraic equations that represents the cipher. Finding the solution of those equations makes it possible for the attacker to recover the key that is denoted by the unknown variables \(x_1, ..., x_n\) hereafter. In order to solve this system of equations, the attacker considers the ideal generated by the multivariate polynomials that define the system. A Gröbner basis is a particular generating set of the ideal. It is defined with respect to a total ordering on the set of monomials, in particular the lexicographic order. As a Gröbner basis with respect to the lexicographic order is of the form

$$\begin{aligned} \{ x_1-h_1(x_n), \ldots , x_{n-1}-h_{n-1}(x_n), h_n(x_n) \}, \end{aligned}$$

the attacker can easily find the solution of the system of equations. To this end, one method is to employ the well-known Buchberger’s criterion [21], which makes it possible to transform a given set of generators of the ideal into a Gröbner basis. From a theoretic point of view, state-of-the-art Gröbner basis algorithms are simply improvements to Buchberger’s algorithm that include enhanced selection criteria, faster reduction step by making use of fast linear algebra, and an attempt to predict reductions to zero. The best well-known algorithm is Faugère’s F5 algorithm [11, 37].

Experiments highlighted that computing a Gröbner basis with respect to the lexicographic order is a slow process. However, computing a Gröbner basis with respect to the grevlex order can be done in a faster manner. Fortunately, the FGLM algorithm [38] makes it possible to transform a Gröbner basis with respect to the grevlex order to another with respect to the lexicographic order. To summarize, the attacker adopts the following strategy:

  1. 1.

    Using the F5 algorithm, compute a Gröbner basis w.r.t. the grevlex order.

  2. 2.

    Using the FGLM algorithm, transform the previous basis into a Gröbner basis w.r.t. the lexicographic order.

  3. 3.

    Using polynomial factorization and back substitution, solve the resulting system of equations.

Henceforth, we consider the following setting: let K be a finite field, let \(A = K[x_1,\ldots ,x_n]\) be the polynomial ring in n variables, and let \(I \subseteq A\) be an ideal generated by a sequence of polynomials \((f_1,\ldots ,f_r) \in A^r\) associated with the system of equations of interest.

Cost of the F5 Algorithm. In the best adversarial scenario, we assume that the sequence of polynomials associated with the system of equations is regular.Footnote 3 In this case, the F5 algorithm does not perform any redundant reductions to zero.

Write \(F_{A/I}\) for the Hilbert-Series of the algebra A/I and \(H_{A/I}\) for its Hilbert polynomial. The degree of regularity \(D_{\text {reg}}\) is the smallest integer such that \(F_{A/I}(n) = H_{A/I}(n)\) for all \(n \ge D_{\text {reg}}\). The quantity \(D_{\text {reg}}\) plays an important role in the cost of the algorithm. If the ideal I is generated by a regular sequence of degrees \(d_1,\ldots ,d_r\), then its Hilbert series equals \(F_{A/I}(t) = \frac{\prod _{i=1}^r(1+t+t^2+\cdots +t^{d_i-1})}{(1-t)^{n-r}}\). From this, we deduce that \(\deg (I) = \prod _{i=1}^r d_i\), and \(D_{\text {reg}} = 1 + \sum _{i=1}^r(d_i-1)\).

The main result is that if \(f_1,\ldots ,f_r\) is a regular sequence in \(K[x_1,\ldots ,x_n]\), then computing a Gröbner basis with respect to the grevlex order using the F5 algorithm can be performed within

$$\begin{aligned} \mathcal {O} \left( \left( {\begin{array}{c}n + D_{\text {reg}}\\ D_{\text {reg}}\end{array}}\right) ^\omega \right) \end{aligned}$$

operations in K, where \(2 \le \omega \le 3\) is the matrix multiplication exponent.

Costs of Gröbner Basis Conversion and of Back Substitution. FGLM is an algorithm that converts a Gröbner basis of I with respect to one order, to a Gröbner basis of I with respect to a second order in \(\mathcal {O}(n\deg (I)^3)\) operations in K. Finally, as proved in [39], the cost of factorizing a univariate polynomial in K[x] of degree d over \(\mathbb {F}_{p^n}\) for a prime p is \(\mathcal {O}(d^3n^2+dn^3)\).

Number of Rounds. After introducing the Gröbner Basis attack, we analyze the minimum number of rounds that is necessary to provide security against this attack. However, we first emphasize that:

  • there are several ways to set up the system of equations that describes the scheme. For instance, we could manipulate more equations, and thus more variables, of lower degree. Alternatively, we could work with less equations, and thus less variables, of higher degree. In addition, we could consider the relation between the input and the output, or between the middle state and the outputs, and so on. In the following, we present some of these strategies, that seem to be the most competitive ones;

  • computing the exact cost of the attack is far from an easy task. As largely done in the literature, we assume that the most expensive step is the “F5 Algorithm”. If the cost of such a step is higher than the security level, we conclude that the scheme is secure against the analyzed attack.

A Weaker Scheme. Instead of using the model that is described in Fig. 3, we analyze a weaker model as illustrated in Fig. 7. In the latter, the key is added after the expansion part, instead of before the rolling function application. This weaker model is easier to analyze, and makes it possible to draw a conclusion regarding the security of our scheme. Thus, we conjecture that if the scheme proposed in Fig. 7 is secure w.r.t. Gröbner Basis attack, then the scheme in Fig. 3 is secure. Indeed, in the scheme proposed in Fig. 7, it is always possible to consider the difference between two or more texts to remove the final key addition. For instance, given \(f(x) + K = y\) and \(f(x^\prime ) + K = y^\prime \), it follows that \(f(x) - f(x^\prime ) = y - y^\prime \). As a result, the number of variables in the system of equations to be solved remains constant independently of the number of considered outputs. However, in Fig. 3, given \(g(x+K) = y\) and \(g(x^\prime + K) = y^\prime \), this is not possible except if \(g(\cdot )\) is inverted. Nevertheless, since it is a truncated permutation, this does not seem feasible, unless the part of the output which is truncated is either treated as a variable (that results to have more variables than equations) or guessed by brute force (that results in an attack whose cost is higher than the security level, and \(2^s\le q\)). Such consideration leads us to conjecture that the number of rounds necessary to make the scheme proposed in Fig. 7 secure is a good indicator of the number of rounds necessary to make the scheme in Fig. 3 secure as well.

Input-Output Relation. The number of rounds must ensure that the maximum degree is reached. Based on that, we do not expect that the relation that holds between the input and the output, makes it possible for the attacker to break the scheme. In particular, let N be the nonce, and \(k_1, k_2\) be the secret keys. If we assume that a single word is output, then an equation of degree \(2^r\) can be expressed between each input \((N, k_1, k_2) \in (\mathbb F_{q})^3\), and the output \(T \in \mathbb F_{q}\) with r the number of rounds. Hence, if there are two different initial nonces, then the attacker has to solve two equations in two variables. In that case, \(D_{reg} = 1+2\cdot (2^r-1)\approx 2^{r+1} \). The cost of the attack is thus lower bounded by \( \left[ \left( {\begin{array}{c}2+2^{r+1}\\ 2^{r+1}\end{array}}\right) \right] ^\omega \ge \left[ \frac{(1+2^{r+1})^2}{2}\right] ^\omega \ge 2^{2r+1}\), where \(\omega \ge 2\). Consequently, \(2^{2r+1}\ge 2^{s}\) if the total number of rounds is at least \(\left\lceil \frac{s-1}{2}\right\rceil \) (e.g., 64 for \(s=128\)). Since the number of rounds for \(p_C\) is \(s+6\), this strategy does not outperform the previous attacks as expected.

Finally, we additionally consider a strategy where new intermediate variables are introduced to reduce the degree of the involved polynomials. We concluded that this strategy does not reduce the solving time as it increases the number of variables.

Middle State-Output Relation. There is another attack strategy that exploits the relation between the middle state and the outputs. In this strategy, only \(p_E\) is involved, and several outputs are generated by the same unknown middle state. For a given nonce N, let \((x^N_0, x^N_1,x^N_2)\in (\mathbb F_{q})^3\) be the corresponding middle state. Since the key is added after the permutation \(p_E\), we first eliminate the key by considering two initial nonces, and taking the difference of the corresponding output. This makes it possible to remove all the secret key material at the end, at the cost of having three more unknown variables in the middle.Footnote 4

Hence, independently of the number of outputs that are generated, there are six variables, and thus simply the two middle states. That means that we need at least six output blocks, and an equivalent number of equations. Since two words are output for each call of \(p_E\), we have six equations of degree \(2^{r-1}\) and \(2^r\) for the first two words, \(2^{r}\) and \(2^{r+1}\) for the next two words, and so on. We recall that every call of the rolling function increases the degree by a factor two, while the function that describes the output of a single block has a maximum degree, namely \(2^r\) after r rounds for one word, and \(2^{r-1}\) for the other two words. Hence, \(D_{reg} = 1 + (2^{r-1}-1) + 2 \cdot \sum _{i=0}^1 (2^{r+i}-1) + (2^{r+2}-1)= 21 \cdot 2^{r-1} - 5 \approx 2^{r + 3.4}\), and the cost of the attack is lower bounded by

$$ \left[ \left( {\begin{array}{c}6+2^{r+3.4}\\ 2^{r+3.4}\end{array}}\right) \right] ^\omega \ge \left[ \frac{(1+2^{r+3.4})^6}{6!}\right] ^\omega \ge 2^{12(r+3.4)-19} \,, $$

where \(\omega \ge 2\). Therefore, \(2^{12(r+3.4)-19}\ge 2^{s}\) if the number of rounds for \(p_E\) is at least \(\left\lceil \frac{s + 19}{12} - 3.4\right\rceil \) (e.g., 9 for \(s=128\)). Like previously, potential improvement of the attack (e.g., an enhanced description of the equations) can lead to a lower computational cost. We thus decided to arbitrarily add five rounds as a security margin. We conjecture that at least \(\left\lceil \frac{s + 19}{12} + 1.5\right\rceil \) rounds for \(p_E\) are necessary to provide some security (e.g., 14 for \(s=128\)).

In addition, in order to reduce the degree of the involved polynomials, we studied the consequences of introducing new intermediate variables in the middle, e.g., at the output of the rolling function or among the roundsFootnote 5. In that regard, we did not improve the previous results. Moreover, we also considered a scenario in which the attacker accesses more data, without being able to improve the previous results.

4.5 On the Algebraic Cipher Representation

Algebraic attacks seem to be the most successful attack vector on ciphers that have a simple representation in larger fields, while restricting the usage of multiplications. Until now, we have mainly focused on the growth of the degree to estimate the costs of the algebraic attacks that we considered. However, this is not the only factor that influences the cost of an algebraic attack. It is well known that such attacks (including higher-order, interpolation, and Gröbner basis attacks) can be more efficient if the polynomial that represents the cipher is sparse. Consequently, it is necessary to study the algebraic representation of the cipher for a feasible number of rounds.

To evaluate the number of monomials that we have for a given degree, we wrote a dedicated tool. This tool produces a symbolic evaluation of the round function without considering a particular field or specific round constants. Nevertheless, it considers the fact that each element in \({ \mathbb {F}_{2^n} }\) is also its inverse with respect to the addition. Since we do not instantiate any field and constants, the reported number of monomials might deviate from the real number of monomials here, e.g., due to unfortunate choices of round constants that sum to zero for some monomials. As a result, the entries in the tables are in fact upper bounds, but we do not expect high discrepancies between the numbers reported in the tables and the “real” ones.

Prime Case. First, we consider iterations of the round function f over \( \mathbb {F}_{p} \). In Table 2, we evaluate the output functions at \(a_i\), \(b_i\), and \(c_i\) depending on the inputs \(a_0\), \(b_0\), and \(c_0\) after a certain number of rounds \(i\ge 2\). We count in Table 2 the number of monomials for a certain multivariate degree up to a fixed degree \(d_{ \mathbb {F}_{p} }\). Higher degree monomials might appear, but they are not presented in the table. To report this behavior, we do not input 0 in the table after the highest degree monomial. The column ‘max’ indicates the maximal number of monomials that can be encountered for three variables. As reported in Table 2, the number of monomials increases quite quickly, and we do not observe any unexpected behavior, or missing monomials of a certain degree.

Table 2. Number of monomials of a certain degree for \( \mathbb {F}_{p} \).
Table 3. Number of monomials of a certain degree for \( \mathbb {F}_{2^n} \).

Binary Case. Table 3 provides the number of monomials of a certain degree in \( \mathbb {F}_{2^n} \). We notice that the diffusion is slower than in \( \mathbb {F}_{p} \), and it may be because of the behavior of the addition that is self inverse in \( \mathbb {F}_{2^n} \). More discussions on the algebraic cipher representation in the binary case can be found in [34, App. D].

5 Comparison with Other Designs

In this section, we compare the performance of our design with other designs that are presented in the literature for an MPC protocol using masked operations. We mainly focus on the number of multiplications in an MPC setting, which is often the metric that influences the most the cost in such a protocol. In addition, we discuss the number of online and pre-computation/offline rounds, and we compare those numbers to the ones specified for other schemes. The influence of the last two metrics on the overall costs highly varies depending on the concrete protocol/application, and the concrete environment, in which an MPC protocol is used, e.g., network of computers vs. a system on chip. Finally, we consider the advantages and the disadvantages of our design w.r.t. the other ones.

5.1 MPC Costs: Ciminion and Related Works

We compare the MPC cost of Ciminion with the cost of other designs that are published in the literature with \(q\approx 2^{128}\), and \(s =128\) bits. We assume that the amount of data available to the attacker is fixed to \(2^{s/2} = 2^{64}\), which is the most common case. Due to page limitation, we limit our analysis to Ciminion and HadesMiMC. The latter is the main competitive design currently present in the literature for the analyzed application. The detailed comparison with other designs (including MiMC, GMiMC, Rescue and Vision) is provided in [34, App. G]. A summary of the comparison is given in Table 4 and 5 for the binary and prime case, respectively.

Our design has the lowest minimum number of multiplications w.r.t. all other designs, in both \( \mathbb {F}_{p} \) and \( \mathbb {F}_{2^n} \). In \({ (\mathbb {F}_{q})^{t} }\) for \(q\approx 2^{128}\), our design needs approximately \(t + 14 \cdot \lceil t/2\rceil \approx 8 \cdot t\) multiplications w.r.t. \(12 \cdot t\) multiplications required by HadesMiMC or \(60\cdot t\) by Rescue. Additionally, our design has a low number of linear operations compared to other designs. For instance, for large \(t\gg 1\), our design needs approximately \(50\cdot t\) affine operations (sums and multiplications with constants) while HadesMiMC requires approximately \(12 \cdot t^2 + (157 +4\cdot \max \{32;\lceil \log _3(t)\rceil \}) \cdot t \) affine operations. However, this advantage comes at the price of having more online rounds than the other schemes. In particular, \(104 + \lceil t/2\rceil \) online rounds are required by our design whereas HadesMiMC and Rescue have respectively 78 and 20 online rounds.

Table 4. Comparison on the MPC cost of schemes over \({\mathbb F_{2^n}}^t\) for \(n = 128\) (or 129), and a security level of 128 bits. With the exception of Vision (whose number of offline rounds is equal to \(\max \left\{ 20, 2 \cdot \left\lceil \frac{136 + t}{t}\right\rceil \right\} \)), the number of offline rounds for all other schemes is zero.
Table 5. Comparison on the MPC cost of schemes over \({\mathbb F_{p}}^t\) for \(p \approx 2^{128}\), and a security level of \(\approx 128\) bits. With the exception of Rescue (whose number of offline rounds is equal to \(\max \{30; 6\cdot \left\lceil \frac{32.5}{t}\right\rceil \}\)), the number of offline rounds for all other schemes is zero.

Ciminion. For \(q\approx 2^{128}\), and a security level of 128 bits with data limited to \(2^{64}\), the permutation \(p_C\) counts 90 rounds. In order to output \(2t^\prime -1 \le t \le 2t^\prime \) words, we call \(t^\prime \) times the permutation \(p_E\) that is composed of 14 rounds, and \((t^\prime -1)\) times the rolling function. Therefore, for the binary and the prime case, the cost of Ciminion in MPC applications to generate t words is

$$\begin{aligned} \#\, \text {multiplications:}&\qquad 14 \cdot \lceil t/2 \rceil + (t-1) + 90 \approx 8 \cdot t + 89 \,,\\ \#\, \text {online rounds:}&\qquad 104 + \lceil t/2\rceil \,, \\ \#\, \text {affine operations:}&\qquad 99 \cdot \lceil t/2 \rceil + 629 \approx 50 \cdot t + 629 \,. \end{aligned}$$

The number of online rounds depends on t, because the rolling function is serial. It is noteworthy that the expansion part can be performed in parallel. We emphasize that the number of sums and multiplications with a constantFootnote 6 (denoted as “affine” operations) is proportional to the number of multiplications. That is one of the main differences w.r.t. to the Hades construction as we argue afterwards.

HadesMiMC. HadesMiMC [41] is a block cipher that is proposed over \({ (\mathbb {F}_{p})^{t} }\) for a prime p such that \(\gcd (p-1,3)=1\), and \(t\ge 2\). It combines \(R_F = 2R_f\) rounds with a full S-box layer (\(R_f\) at the beginning, and \(R_f\) at the end), and \(R_P\) rounds with a partial S-box layer in the middle. Each round is defined with \(R_i(x) = k_i + M \times S(x)\), where M is a \(t\times t\) MDS matrix, and S is the S-box layer. This layer is defined as the concatenation of t cube S-boxes in the rounds with full layer, and as the concatenation of one cube S-Box and \(t-1\) identity functions in the rounds with partial layer.

In addition, hash functions can be obtained by instantiating a Sponge construction with the Hades permutation, and a fixed key, like Poseidon & Starkad [40]. In [15], the authors present an attack on Starkad that exploits a weakness in the matrix M that defines the MixLayer. The attack takes advantage of the equation \(M^2 = \mu \cdot I\). This attack can be prevented by carefully choosing the MixLayer (we refer to [44] for further detail). There is no attack that is based on an analogous strategy that has been proposed for the cipherFootnote 7.

In order to guarantee some security, \(R_F\) and \(R_P\) must satisfy a list of inequalities [41]. There are several combinations of \((R_F, R_P)\) that can provide the same level of security. In that regard, authors of [41] present a tool that makes it possible to find the best combination that guarantees security, and minimizes the computational cost. For a security level of approximately \(\log _2(p)\) bits, and with \(\log _2(p) \gg t\), the combination \((R_F, R_P)\) minimizing the overall number of multiplications is

$$ (R_F, R_P) \! = \! \left( \! 6, \max \left\{ \left\lceil \frac{\log _3(p)}{2}\right\rceil + \lceil \log _3(t) \rceil ; \lceil \log _3(p)\rceil - 2\lfloor \log _3(\log _2(p))\rfloor \right\} - 2 \!\right) . $$

In MPC applications (\(p \approx 2^{128}\) and \(s=128\) bits), the cost of HadesMiMC is

$$\begin{aligned} \#\, \text {multiplications:}&\qquad 2\cdot (t\cdot R_F + R_P) = 12 t + \max \{78 +\lceil \log _3(t^2)\rceil \, ;142\} \,, \\ \#\, \text {online rounds:}&\qquad R_F + R_P = \max \{45 + \lceil \log _3(t)\rceil ; 77\} \,, \\ \#\, \text {affine operations:}&\qquad 2\cdot t^2 \cdot R_F + (4\cdot R_P +1) \cdot t - 2 \cdot R_P \\&\qquad \approx 12 \cdot t^2 + (157 +4\cdot \max \{32;\lceil \log _3(t)\rceil \}) \cdot t \,. \end{aligned}$$

Parallel S-boxes can be computed in a single online roundFootnote 8. To compute the number of affine operations, we considered an equivalent representation of the cipher in which the MixLayer of the rounds, with a partial S-box layer, is defined by a matrix. In this matrix, only \(3t - 2\) entries are different from zero, that is to say the ones in the first column, in the first row, and in the first diagonal. (A \((t-1)\times (t-1)\) submatrix is an identity matrix.) The details are presented in [41, App. A]. Therefore, the total number of affine operations required grows quadratically w.r.t. the number of rounds with full S-box layer, and thus w.r.t. the number of multiplications.

Finally, we highlight that the number of multiplications is minimized when HadesMiMC takes as input the entire message. Indeed, let us assume that the input message is split into several parts, and that HadesMiMC is used in CTR mode (as suggested by the designers). In the analyzed case in which the security level is of the same order of the size of the field p, the number of rounds is almost constant, and independent of the parameter \(t\ge 2\). It follows that using HadesMiMC in CTR mode would require more multiplications, because every process requires the computation of the rounds with a partial S-box layer, whereas this computation is needed only once when the message size equals the block size. We stress that a similar conclusion holds for Rescue/Vision, for which the total number of multiplications would barely change when they are used in CTR mode, rather than when the message size is equal to the block size.

5.2 Ciminion Versus Hades: Advantages and Similarities

The previous comparison highlights that the two most competitive designs for MPC applications with a low multiplicative complexity are Ciminion and HadesMiMC. Referring to Fig. 1, we further develop the similarities and advantages between a block cipher based on a Hades design, and a cipher based on Farfalle. We present a brief comparison between our new design and the “ForkCipher” design that is proposed in [7] in [34, App. G.2].

Similarities: Distribution of the S-Boxes. We focus our attention on the distribution of the S-boxes, or more generally, the non-linear operations. Both strategies employ a particular parallelization of the non-linear operations/S-boxes to their advantage, in order to minimize the number of non-linear operations. More precisely, each step is composed of t parallel non-linear operations in the external rounds, i.e., the rounds at the end and at the beginning. Furthermore, each step is composed of a single non-linear operation in the internal rounds.

Both strategies take advantage of an attacker that cannot directly access the state in the middle rounds, because the state is masked both by the external rounds or phases, and by the presence of a key. In a Farfalle design, the attacker knows that each output of the expansion phase always employs the same value at the input, without accessing those inputs. In a Hades design, the attacker is able to skip some rounds with a partial S-box layer by carefully choosing the texts (see [15]). However, they cannot access the texts without bypassing the rounds with the full S-box layer that depends on the key.

Having middle rounds with a single S-box makes it possible to reduce the overall number of non-linear operations. In addition, they ensure some security against algebraic attacks. Indeed, even a single S-box makes it possible to increase the overall degree of the scheme. For a concrete example, let \((R_c, R_m, R_e)\) be the rounds for respectively the compression part, middle part and expansion part of Farfalle. Like previously, let \((R_F, R_P)\) be the number of rounds with respectively a full and a partial S-box layer in Hades. The number of multiplications is respectively \((R_c + R_e) \cdot t + R_m\) and \(R_F \cdot t + R_P\). If \(R_P \gg R_F\) and \(R_m \gg R_c + R_e\). For a similar number of round, i.e., proportional to \(\approx R_P + R_F\) or/and \(\approx R_m + R_c + R_e\), it is then necessary to reach the maximum degree. Our number of multiplications is lower compared to a classical design where the rounds have a full S-box layer.

Advantages. There are major differences between Farfalle-like designs and Hades-like designs, because of their primary intention. The Farfalle-like design aims to behave like a Pseudo-Random Function (PRF), and the Hades-like design like a Pseudo-Random Permutation (PRP). The latter is used as a PRF in the Counter mode (CTR).Footnote 9 Under the assumption that affine operations are cheaper than non-linear ones, designers of Hades defined the MixLayer as the multiplication with a \(t\times t\) MDS matrix. Consequently, each round with full S-box layer counts \(t^2\) multiplications with constants. However, when \(t\gg 1\), linear operations cannot be considered as free anymore, and their presences influence the overall performance.

This problem is not present in a Farfalle-like design. Indeed, by construction, in the first \(R_c\) and the last \(R_e\) rounds, the MixLayer is not required. That implies that the first three words are never mixed with the following ones. On the contrary, the elements are simply added together to generate the input of the compression phase. In addition, the expansion part’s input is generated through a non-linear rolling function whose cost grows linearly with t. Finally, since invertibility is not required, the number of input words can be lower than the number of output words to design a function from \({ (\mathbb {F}_{q})^{3} }\) to \({ (\mathbb {F}_{q})^{t} }\) for any \(t\ge 1\). Thus, independently of the number of output words, one multiplication per round is present in the compression phase, contrary to \(\mathcal O(t)\) of a Hades-like scheme.