Keywords

1 Introduction

Algebraic lattices are a powerful tool in cryptography, enabling the many sophisticated constructions such as digital signatures [6, 36], zero-knowledge proofs [38, 42], FHE [25] and others. Applications of main interest are public-key encryptions (PKE) [37, 43] and key encapsulation mechanisms (KEM).

The computational problems defined over lattices are believed to be hard to solve, even with access to large-scale quantum computers, and hence many of these constructions are considered to be quantum-safe. As industry starts to make steps forward into the concrete development of small quantum computers, the US National Institute of Standards and Technology (NIST) begun an open standardization effort, with the aim of selecting quantum-safe schemes for public-key encryption and digital signatures [40]. At the time of writing, the process is in its second round, and 9 out of 17 candidates for PKE or KEM base their security on problems related to lattices, with or without special structure.

One commonly occurring characteristic of lattice-based PKE or KEM schemes is that of lacking perfect correctness. This means that sometimes, ciphertexts generated honestly using a valid public key may lead to decryption failures under the corresponding private key. Throughout this paper we’ll refer to such ciphertexs as “failures”, “decryption failures”, or “failing ciphertexts”. While in practice, schemes are parametrised in such a way that decryption failures do not undermine overall performance, these can be leveraged as a vehicle for key recovery attacks against the key pair used to generate them. Such an attack was described by Jaulmes and Joux [30] against NTRU, after which is was extended in [29] and [24]. A similar attack on Ring-LWE based schemes was later presented by Fluhrer [22] and extended by Băetu et al. [5].

However, the aforementioned attacks all use specially crafted ciphertexts and can therefore be prevented with a transformation that achieves chosen ciphertext security. This can for example be obtained by means of an off-the-shelf compiler [23, 28] that stops the adversary from being able to freely malleate honestly generated ciphertexts.

The NIST Post-Quantum Standardization Process candidate Kyber [8] noted that it was possible to search for ciphertexts with higher failure probability than average. D’Anvers et al. [16] extended this idea to an attack called “failure boosting”, where ciphertexts with higher failure probability are generated to speedup the search for decryption failures, and provided an analysis of the effectiveness of the attack on several NIST candidates. At the same time, Guo et al. [27] described an adaptive attack against the IND-CCA secure ss-ntru-pke variant of NTRUEncrypt [10], which used an adaptive search for decryption failures exploiting information from previously collected ciphertexts.

Our Contributions. In this paper, we present a novel attack technique called “directional failure boosting”, aimed at enhancing the search for decryption failures in public-key encryption schemes based on the protocol by Lyubashevsky et al. [37], in the single-target setting. Our technique is an improvement of the “failure boosting” technique of D’Anvers et al. [16].

We consider a simple (but realistically parametrized) scheme based on the Mod-LWE problem as a case study and make some necessary orthogonality and independance assumptions that are reasonable in our range of parameters. We show that in this setting, the work and number of decryption queries needed to obtain multiple failing ciphertexts is only marginally larger than those necessary to obtain the first decryption failure. For example, obtaining 30 decryption failures requires only \(25\%\) more quantum work and only \(58\%\) more queries than obtaining one decryption failure. As previously shown in [16] and [27], we recall that having many decryption failures enables more efficient lattice reduction which leads to key recovery attacks. As a result, we conclude that when protecting against decryption failure attacks, in the single target setting, designers should make sure that an adversary can not feasibly obtain even a single decryption failure.

Our attack outperforms previously proposed attacks based on decryption failures. In particular, it improves over the multitarget attack of Guo et al. [27] on ss-ntru-pke, lowering the attack’s quantum complexity from \(2^{139.5}\) to \(2^{96.6}\).

Paper Outline. In Sect. 2, we introduce some preliminaries about notation and structures. In Sect. 3, we describe the general idea of lattice-based encryption and how decryption failures are generated. In Sect. 4, we recall the original failure boosting technique from [12]. In Sect. 5, we describe our directional failure boosting technique. In Sect. 6, we showFootnote 1 how this method impacts the total work and queries overhead. Finally in Sect. 7, we discuss the results by comparing them with the literature and conclude with possible future work.

2 Preliminaries

Let \(\mathbb {Z}_q\) be the ring of integers modulo q. For N a power of 2, we define \(R_q\) the ring \(\mathbb {Z}_q[X]/(X^N+1)\), and \(R_q^{l_1 \times l_2}\) the ring of \(l_1 \times l_2\) matrices over \(R_q\). Vectors and polynomials will be indicated with bold lowercase letters, eg. \(\mathbf {v}\), while matrices will be written in bold uppercase letters, eg. \(\mathbf {M}\). Denote with \(\lfloor \cdot \rfloor \) flooring to the nearest lower integer, and with \(\lfloor \cdot \rceil \) rounding to the nearest integer. These operations are extended coefficient-wise for vectors and polynomials. Throughout, we abuse notation and identify elements in \(\mathbb {Z}_q\) with their representatives in \([ -q/2, q/2 )\), and elements in \(R_q\) with their representatives of degree \(< N\), with index i indicating the coefficient of \(X^i\). This allows us to define the \(\ell _2\)-norm \(\Vert \mathbf {x}\Vert _2\) of a polynomial \(\mathbf {x} \in R_q\), so that \(\Vert \mathbf {x}\Vert _2 = \sqrt{\sum _i \mathbf {x}_i^2}\) where \(\mathbf {x}_i \in [ -q/2, q/2 )\), and extend this to vectors of polynomials \(\mathbf {y} \in R_q^{l \times 1}\) as \(\Vert \mathbf {y}\Vert _2 = \sqrt{\sum _i \Vert \mathbf {y}_i \Vert _2^2}\). Identically, we define and extend the \(\ell _{\infty }\)-norm.

Let \(x \leftarrow X\) denote sampling x according to the probability distribution X. We extend this notation for coefficient-wise sampling of a vector \(\mathbf {x} \in R^{l \times 1}_q\) as \(\mathbf {x} \leftarrow X(R^{l \times 1}_q)\), and similarly for a matrix. We denote with \(\mathbf {x} \leftarrow X(R^{l \times 1}_q; r)\) sampling \(\mathbf {x} \in R^{l \times 1}_q\) pseudorandomly from the seed r with each coefficient following the distribution X. In algorithms, we also use \(x \leftarrow Alg()\) to mean that the value x is assigned to be the output of a probabilistic algorithm Alg.

Let \(\mathcal {U}\) be the uniform distribution over \(\mathbb Z _q\) and let \(\mathcal {N}_{\mu , \sigma }\) be the normal distribution with mean \(\mu \) and standard deviation \(\sigma \), so that the probability density function of \(x \leftarrow \mathcal {N}_{\mu , \sigma }\) is defined as:

(1)

The discrete Gaussian distribution \(\mathcal {D}_{\mu , \sigma }\) is a discrete restriction to \(\mathbb {Z} _q\) of \(\mathcal {N}_{\mu , \sigma }\), so that an integer x is sampled with a probability proportional to \(e^{-(x-\mu )^2/2\sigma ^2}\) and its remainder modulo q in \([ -q/2, q/2 )\) is returned.

For an event A we define P[A] as its probability. For an element which does not correspond to an event, a ciphertext ct for example, we abusively write P[ct] to denote the probability of the event \(ct'= ct\) where \(ct'\) is drawn from a distribution which will be clear in the context. We will denote with \(\mathbb {E}[A]\) the expected value of a variable drawn from a distribution A.

Security Definitions. Let \(\varPi = (\mathsf {KeyGen},\mathsf {Enc},\mathsf {Dec})\) be a public-key encryption scheme, with message space \(\mathcal {M}\), and let \(K = (\mathsf {KeyGen},\mathsf {Encaps},\mathsf {Decaps})\) be a key encapsulation mechanism (KEM). When a decapsulation or a decryption oracle is provided, we assume that the maximum number of ciphertexts that can be queried to it for each key pair is \(2^K\); in practice, \(K=64\) is often considered [40, §4.A.2]. In this work, we keep the maximum number of queries as a parameter with no specific value, in order to provide a better granularity in the security assessement. Indeed, to mount an attack, the adversary trades off between number of queries and the work.

Definition 1

(IND-CPA\(_{A,\varPi }(k)\) game [33]). Let A be an adversary and \(\varPi = (\mathsf {KeyGen},\mathsf {Enc},\mathsf {Dec})\) be a public-key encryption scheme. The experiment \(\text {IND-CPA}_{A, \varPi }(1^k)\) runs as follows:

  1. 1.

    \((pk, sk) \leftarrow \mathsf {KeyGen}(1^k)\)

  2. 2.

    A is given pk. After evaluating \(\mathsf {Enc}(pk, \cdot )\) as desired, it outputs

    \((m_0, m_1)~\in ~\mathcal {M}~\times ~\mathcal {M}\).

  3. 3.

    A random bit \(b \leftarrow _{\$} \{0, 1\}\) is sampled, and \(c \leftarrow \mathsf {Enc}(pk, m_b)\) is passed to A.

  4. 4.

    A keeps evaluating \(\mathsf {Enc}(pk, \cdot )\) as desired, until it returns a bit \(b'\).

  5. 5.

    The experiment outputs 1 if \(b = b'\) and 0 otherwise.

Definition 2

(IND-CCA\(_{A,K}(k)\) game [33]). Let A be an adversary and \(K = (\mathsf {KeyGen},\mathsf {Encaps},\mathsf {Decaps})\) be a key encapsulation mechanism. The experiment \(\text {IND-CCA}_{A, K}(1^k)\) runs as follows:

  1. 1.

    \((pk, sk) \leftarrow \mathsf {KeyGen}(1^k)\)

  2. 2.

    \((c, k) \leftarrow \mathsf {Encaps(pk)}\)

  3. 3.

    \(b \leftarrow _{\$} \{0,1\}\). If \(b=0\), set \(\hat{k} = k\), else let \(\hat{k} \leftarrow \{0, 1\}^n\).

  4. 4.

    A is given \((pk, c, \hat{k})\), and access to a decapsulation oracle \(\mathsf {Decaps}(sk, \cdot )\). After evaluating \(\mathsf {Encaps}(pk, \cdot )\) and querying \(\mathsf {Decaps}(sk, \cdot )\) as desired (except for decapsulation queries on c), it returns \(b' \in \{0, 1\}\).

  5. 5.

    The experiment outputs 1 if \(b = b'\) and 0 otherwise.

Definition 3

(PKE and KEM security [23]). A public-key encryption scheme \(\varPi \) (resp. a key encapsulation mechanism K) is \((t, \epsilon )\)-GAME secure if for every t-time adversary A, we have that

$$ \left| \Pr [GAME_{A, \varPi }(k) = 1] - \frac{1}{2}\right| \le \epsilon \quad \left( \text { resp. } \left| \Pr [GAME_{A, K}(k) = 1] - \frac{1}{2}\right| \le \epsilon \right) $$

For a security parameter \(1^k\), we usually mean \(t \approx poly(k)\) and \(\epsilon \le negl(k)\). If GAME is IND-CPA (resp. IND-CCA) we say that \(\varPi \) (resp. K) is \((t, \epsilon )\)-secure against chosen-plaintext attacks (resp. \((t, \epsilon )\)-secure against adaptive chosen-ciphertext attacks).

3 Lattice-Based Encryption

The Module-LWE (or Mod-LWE) problem [34] is a mathematical problem that can be used to build cryptographic primitives such as encryption [7, 13], key exchange [13] and signatures [20]. It is a generalization of both the Learning With Errors (or LWE) problem [43], and the Ring-LWE problem [37, 47].

Definition 4

(Mod-LWE [34]). Let \(n,\,q,\,k\) be positive integers, \(\chi \) be a probability distribution on \(\mathbb Z \) and \(\mathbf {s}\) be a secret module element in \(R_q^k\). We denote by \(\mathcal {L}\) the probability distribution on \(R_q^k \times R_q\) obtained by choosing \(\mathbf {a} \in R_q^k\) uniformly at random, choosing \(e \in R\) by sampling each of its coefficients according to \(\chi \) and considering it in \(R_q\), and returning \((\mathbf {a},c) = (\mathbf {a},\langle \mathbf {a},\mathbf {s}\rangle + e) \in R_q^k \times R_q\).

Decision-Mod-LWE is the problem of deciding whether pairs \((\mathbf {a}, c) \in R_q^k \times R_q\) are sampled according to \(\mathcal {L}\) or the uniform distribution on \(R_q^k \times R_q\).

Search-Mod-LWE is the problem of recovering \(\mathbf {s}\) from \((\mathbf {a}, c)=(\mathbf {a},\langle \mathbf {a},\mathbf {s}\rangle + e) \in R_q^k \times R_q\) sampled according to \(\mathcal {L}\).

3.1 Passively and Actively Secure Encryption

Lyubashevsky et al. [37] introduced a simple protocol to build passively secure encryption from the Ring-LWE problem, inspired by Diffie-Hellman key exchange [19] and ElGamal public-key encryption [21]. Naturally, the protocol can also be adapted to work based on plain and Module LWE assumptions. A general extension of the protocol for all aforementioned assumptions is described in Algorithms 1, 2, and 3, where \(r \in \mathcal {R} = \{0,1\}^{256}\), and where the message space is defined as \(\mathcal {M} = \{\text {polynomials in } R_q \text { with coefficients in } \{0,1\}\}\).

In order to obtain active security, designers usually use an off-the-shelf CCA compiler, usually a (post-quantum) variant [18, 28, 31, 44, 48] of the Fujisaki-Okamoto transform [23] (FO). These come with proofs of security in the (quantum) random oracle model, with explicit bounds about the loss of security caused by the transformation. We show such transformed KEM Decapsulation and Encapsulation in Algorithms 4 and 5.

figure a
figure b
figure c

In the case of FO for lattice-based schemes, the randomness used during the encryption is generated by submitting the message (and sometimes also the public key) to a random oracle. As this procedure is repeatable with knowledge of the message, one can check the validity of ciphertexts during decapsulation. Hence, an adversary wanting to generate custom ephemeral secrets \(\mathbf {s}', \mathbf {e}', \mathbf {e}''\) in order to fabricate weak ciphertexts, would need to know a preimage of the appropriate random coins for the random oracle. Therefore, their only option is to mount a (Grover’s) search by randomly generating ciphertexts corresponding to different messages \(\mathbf {m}\), and testing if their predicted failure probability is above a certain threshold.

Remark 1

Several lattice-based candidates submitted to the NIST Post-Quantum Cryptography Standardization Process use a variant of the protocol by Lyubashevsky et al. [37]. Deviating from the original design, most candidates perform an additional rounding of the ciphertext \(\mathbf {v}'\), in order to reduce bandwidth. The designers of New Hope [3] and LAC [35] choose to work directly over rings (or equivalently, they choose a module of rank \(l=1\)) and add error correction on the encapsulated message, while the designers of Kyber [7] and Saber [13] choose a module of rank \(l>1\) and perform an additional rounding of \(\mathbf {b}'\) (and \(\mathbf {b}\) in case of Saber). We here focus on the basic version given in Algorithms 1 to 3 and leave the study of the effect of compression to further work.

We selected the parameters of the studied encryption scheme to ensure a similar failure probability and security to Kyber and Saber. These parameters can be found in Table 1. The security estimates are generated using the Core-SVP methodology [3] and the LWE estimatorFootnote 2 [2], while the failure probability of Kyber and Saber is given as reported in their respective the NIST round 2 documentations [14, 46]. The failure probability of our chosen parameters is determined by calculating the variance of the error term and assuming the distribution to be Gaussian.

Table 1. Comparison between our target scheme and Saber and Kyber 768, as parametrised in Round 2 of the NIST PQC standardization process. The classical (resp. quantum) security is evaluated using the Core-SVP [3] methodology, assuming the cost of BKZ with block size \(\beta \) to be \(2^{0.292\beta }\) (resp. \(2^{0.265\beta }\)).

Remark 2

We do not consider the case of “plain” LWE based schemes like FrodoKEM [39] or Round5 [4]. Nonetheless, we believe that the attack methodology would easily translate to the LWE setting as the failure condition and the failure probabilities are similar to the investigated case.

3.2 Decryption Failures

Following the execution of the protocol, both messages \(\mathbf {m}'\) and \(\mathbf {m}\) are the same if the coefficients of the error term \(\mathbf {e}^T\mathbf {s}' - \mathbf {s}^T \mathbf {e}' + \mathbf {e}''\) are small enough; more exactly if \(\Vert \mathbf {e}^T\mathbf {s}' - \mathbf {s}^T \mathbf {e}' + \mathbf {e}''\Vert _\infty \le q/4\). This expression can be simplified by defining the vector \(\mathbf {S}\) as the vertical concatenation of \(-\mathbf {s}\) and \(\mathbf {e}\), the vector \(\mathbf {C}\) as the concatenation of \(\mathbf {e}'\) and \(\mathbf {s}'\), and by replacing \(\mathbf {e}''\) with \(\mathbf {G}\), as shown below:

$$\begin{aligned} \mathbf {S} = \begin{bmatrix} - \mathbf {s} \\ \mathbf {e} \end{bmatrix} \quad \mathbf {C} = \begin{bmatrix} \mathbf {e}' \\ \mathbf {s}' \end{bmatrix}\quad \mathbf {G}=\mathbf {e}''. \end{aligned}$$
(2)

Here, \(\mathbf {S}\) contains the secret elements of the secret key, and \(\mathbf {C}\) and \(\mathbf {G}\) consist of elements used to construct the ciphertextsFootnote 3. Using these vectors, the error expression can be rewritten: a failure occurs when \(\Vert \mathbf {S}^T\mathbf {C}+\mathbf {G}\Vert _\infty > q/4\).

The standard deviation of the terms in the polynomial \(\mathbf {S}^T\mathbf {C}\) equals \(\sqrt{2N} \sigma _s\sigma _e\), versus a standard deviation of \(\sigma _e\) for the terms of \(\mathbf {G}\). Therefore, the influence of \(\mathbf {G}\) on the failure rate is limited, i.e. \(\Vert \mathbf {S}^T\mathbf {C} + \mathbf {G}\Vert _\infty \approx \Vert \mathbf {S}^T\mathbf {C}\Vert _\infty \). Let \(q_t := q/4\) denote the failure threshold, we will use

$$\begin{aligned} \Vert \mathbf {S}^T\mathbf {C}\Vert _\infty > q_t \end{aligned}$$
(3)

as an approximation of the failure expression throughout our analysis. However, with some extra work, one can rewrite a more accurate Eq. 3 as \(\Vert \mathbf {S}^T\mathbf {C}\Vert _\infty > q_t - \Vert \mathbf {G}\Vert _\infty \), and instead of considering \(q_t\) to be fixed, taking the distribution of \(q_t - \Vert \mathbf {G}\Vert _\infty \) as shown in [16]. For the ease of the implementation and due to the low influence of \(\mathbf {G}\) on the failure rate, we prefer to stick with Eq. 3. We now introduce a more handy way of writing the failure condition (Eq. 3) by only using vectors in \(\mathbb {Z} _{q}\).

Definition 5

(Coefficient vector). For \(\mathbf {S} \in R^{l\times 1}_{q}\), we denote by \(\overline{\mathbf {S}} \in \mathbb {Z} _{q}^{lN \times 1}\), the representation of \(\mathbf {S}\) where each polynomial is decomposed as a list of its coefficients inFootnote 4 \(\mathbb {Z} _q\).

Definition 6

(Rotations). For \(r\in \mathbb {Z} \) and \(\mathbf {C} \in R^{l\times 1}_{q}\), we denote by \(\mathbf {C}^{(r)}\in R^{l\times 1}_{q}\), the following vector of polynomials

$$\begin{aligned} \mathbf {C}^{(r)} := X^{r} \cdot \mathbf {C}(X^{-1}) \mod X^N +1. \end{aligned}$$

Correspondingly, \(\overline{\mathbf {C}^{(r)}}\in \mathbb {Z} _{q}^{lN \times 1}\) denotes its coefficient vector.

It is easy to show that \(\overline{\mathbf {C}^{(r)}}\) is constructed as to ensure that for \(r \in [0,...,N-1]\), the \(r^{th}\) coordinate of \(\mathbf {S}^T\mathbf {C}\) is given by the scalar product \(\overline{\mathbf {S}}^T \overline{\mathbf {C}^{(r)}}\). In other words, one is now able to decompose \(\mathbf {S}^T\mathbf {C}\) as a sum of scalar products:

$$\begin{aligned} \mathbf {S}^T\mathbf {C} = \sum _{r\in [0,N-1]} \overline{\mathbf {S}}^T \overline{\mathbf {C}^{(r)}}\cdot X^r. \end{aligned}$$
(4)

One can observe that this construction is only valid for the modulo \(X^N+1\) ring structure, but it could be adapted for other ring structures. Note that for any \(r\in \mathbb {Z} \), \(\mathbf {C}^{(r+N)} = - \mathbf {C}^{(r)}\) and \(\mathbf {C}^{(r + 2N)} = \mathbf {C}^{(r)}\). Besides, taking into account the extension of the norms to vectors of polynomials (defined in Sect. 2), one can make the following remark.

Remark 3

Note that for any \(r\in \mathbb {Z} \), \(\Vert \overline{\mathbf {C}^{(r)}} \Vert _2 = \Vert \overline{\mathbf {C}} \Vert _2 = \Vert \mathbf {C} \Vert _2\) and \(\Vert \overline{\mathbf {C}^{(r)}} \Vert _{\infty } = \Vert \overline{\mathbf {C}} \Vert _{\infty } = \Vert \mathbf {C} \Vert _{\infty }\).

The decomposition in Eq. 4 will allow a geometric interpretation of the failures as it will be shown in the rest of the paper. First, let us introduce a brief example to illustrate Definitions 5 and 6.

Example 1

For a secret \(\mathbf {S}\) and a ciphertext \(\mathbf {C}\) in \(\mathbb {Z}^{2 \times 1}_q[X]/(X^3+1)\):

$$\begin{aligned} \mathbf {S} = \begin{bmatrix} s_{0,0} + s_{0,1} X + s_{0,2} X^2 \\ s_{1,0} + s_{1,1} X + s_{1,2} X^2 \end{bmatrix}, \quad \mathbf {C} = \begin{bmatrix} c_{0,0} + c_{0,1} X + c_{0,2} X^2 \\ c_{1,0} + c_{1,1} X + c_{1,2} X^2 \end{bmatrix} \end{aligned}$$
(5)

we get the following vectors:

$$\begin{aligned} \overline{\mathbf {S}} = \begin{bmatrix} s_{0,0} \\ s_{0,1} \\ s_{0,2} \\ s_{1,0} \\ s_{1,1} \\ s_{1,2} \end{bmatrix}, \quad \overline{\mathbf {C}^{(0)}} = \begin{bmatrix} c_{0,0} \\ - c_{0,2} \\ - c_{0,1} \\ c_{1,0} \\ - c_{1,2} \\ - c_{1,1} \end{bmatrix} \quad \overline{\mathbf {C}^{(1)}} = \begin{bmatrix} c_{0,1} \\ c_{0,0} \\ - c_{0,2} \\ c_{1,1} \\ c_{1,0} \\ - c_{1,2} \end{bmatrix} \quad \overline{\mathbf {C}^{(2)}} = \begin{bmatrix} c_{0,2} \\ c_{0,1} \\ c_{0,0} \\ c_{1,2} \\ c_{1,1} \\ c_{1,0} \end{bmatrix} \quad \overline{\mathbf {C}^{(3)}} = \begin{bmatrix} - c_{0,0} \\ c_{0,2} \\ c_{0,1} \\ - c_{1,0} \\ c_{1,2} \\ c_{1,1} \end{bmatrix} \dots \end{aligned}$$

In case of a failure event, \(\mathbf {S}^T\mathbf {C}\) satisfies Eq. 3. Therefore, at least one element among all the coefficients

$$\begin{aligned} \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}^{(0)}}\, ,\, \dotsc \, ,\, \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}^{(2N-1)}} \end{aligned}$$

is larger than \(q_t\).

Definition 7

(Failure event). A failure event will be denoted with F, while we use S to indicate a successful decryption.

More precisely, for \(r\in [0, 2N-1]\), we denote by \(F_{r}\) the failure event where

$$\begin{aligned} \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}^{(r)}} > q_t. \end{aligned}$$

The event \(F_r\) gives a twofold information: it provides the location of the failure in the \(\mathbf {S}^T\mathbf {C}\) polynomial and it also provides the sign of the coefficient that caused the failure.

An Assumption on the Failing Ciphertexts. In the rest of the paper, in order to predict the results of our attack, we will make the following orthogonality assumption.

Assumption 1

Let \(n \ll 2Nl\), and \(\mathbf {C}_0, \cdots , \mathbf {C}_n\) be ciphertexts that lead to failure events \(F_{r_0}, \cdots , F_{r_n}\). The vectors \(\overline{\mathbf {C}_0^{(r_0)}}, \cdots , \overline{\mathbf {C}_n^{(r_n)}}\) are considered orthogonal when projected on the hyperplane orthogonal to \(\overline{\mathbf {S}}\).

This assumption is an approximation that is supported by the fact that vectors in high dimensional space have a strong tendency towards orthogonality, as can be seen in Fig. 2.

4 Failure Boosting Attack Technique

Failure boosting is a technique introduced in [16] to increase the failure rate of (Ring/Mod)-LWE/LWR based schemes by honestly generating ciphertexts and only querying weak ones, i.e. those that have a failure probability above a certain threshold \(f_t >0\). This technique is especially useful in combination with Grover’s algorithm [26], in which case the search for weak ciphertexts can be sped up quadratically. Failure boosting consists of two phases: a precomputation phase, and a phase where the decryption oracle is queried.

Precomputation Phase. The adversary does an offline search for weak ciphertexts with the following procedure:

  1. 1.

    Generate a key encapsulation (see Footnote 3) \(ct=(\mathbf {C}, \mathbf {G})\).

  2. 2.

    If \(P[F \, | \, ct ] \ge f_t\), keep ct in a weak ciphertext list, otherwise go to Step 1.

In Step 2, \(P[F \, | \, ct ]\) is defined as the failure probability given a certain ciphertext ct. It is computed as follows.

$$\begin{aligned} P[F \, | \, ct ] := \sum _{\mathbf {S}} P\left[ \left\| \mathbf {S}^T\mathbf {C} + \mathbf {G}\right\| _\infty > q_t\ |\ \mathbf {S}\right] \cdot P[\mathbf {S}] \end{aligned}$$
(6)

Given the probability of generating ciphertexts \(P[ct] = P[\mathbf {C}, \mathbf {G}]\), the probability of finding such a weak ciphertext can be expressed as follows:

$$\begin{aligned} \alpha _{f_t} = \sum _{\forall ct: P[ F | ct ] > f_t} P[ ct ]. \end{aligned}$$
(7)

An adversary thus needs to perform on average \(\alpha ^{-1}_{f_t}\) work to obtain one weak ciphertext, or \(\sqrt{\alpha ^{-1}_{f_t}}\) assuming Grover’s search achieves a full speed-up.

Decryption Oracle Query Phase. After the precomputation phase, an adversary has a probability \(\beta _{f_t}\) that a weak ciphertext results in a failure, where \(\beta _{f_t}\) can be calculated as a weighted average of the failure probabilities of weak ciphertexts:

$$\begin{aligned} \beta _{f_t} = \frac{\sum _{\forall ct: P[ F | ct ]> f_t} P[ct] \cdot P[ F | ct] }{\sum _{\forall ct: P[ F | ct ] > f_t} P[ct]}. \end{aligned}$$
(8)

Thus to obtain one decryption failure with probability \(1-e^{-1}\), an adversary needs to perform approximately \(\beta ^{-1}_{f_t}\) queries and therefore \(\alpha ^{-1}_{f_t} \beta ^{-1}_{f_t}\) work (or \(\sqrt{\alpha ^{-1}_{f_t}} \beta _{f_t}^{-1}\) using a quantum computer).

The better an adversary can predict P[F|ct], the more efficient failure boosting will be. Having no information about the secret except its distribution, an adversary is bound to standard failure boosting, where the failure probability is estimated based on \(\Vert \mathbf {C}\Vert _2\) and \(\Vert \mathbf {G}\Vert _2\). For a graphical intuition, a two dimensional toy example is depicted in Fig. 1a below, where the red arrow represents the secret vector \(\overline{\mathbf {S}}\). Ciphertexts with \(\overline{\mathbf {C}}\) that lie in the dashed area will provoke a failure as the inner product with \(\overline{\mathbf {S}}\) will exceed the threshold \(q_t\). The blue circle is a circle of ciphertexts that have a certain failure probability \(f_t\) as estimated by an adversary who does not know the secret. During the failure boosting procedure, we will generate random ciphertexts, and only select the ciphertexts with a higher failure probability than \(f_t\), i.e. that are outside the blue circle. One can graphically see in Fig. 1a that these ciphertexts will have a higher failure probability and a higher norm. We refer to [16] for a full description of the failure boosting technique. Note that Fig. 1a is an oversimplified 2-dimension example that does not take into account the polynomial structure and the high dimensionality of the space.

Fig. 1.
figure 1

Simplified diagram trying to provide an intuition on the effect of directional failure boosting. The red arrow represents the secret vector \(\overline{\mathbf {S}}\). Ciphertexts with \(\overline{\mathbf {C}}\) that lie in the dashed area will provoke a failure as the inner product with \(\overline{\mathbf {S}}\) will exceed the threshold \(q_t\). Ciphertexts outside the blue circle are considered weak. (Color figure online)

5 Directional Failure Boosting

Once \(n\ge 1\) decryption failures \(\mathbf {C}_0, \dotsc , \mathbf {C}_{n-1}\) are found, additional information about the secret key \(\mathbf {S}\) becomes available, and can be used to refine the failure estimation for new ciphertexts and thus speed up failure boosting. We now introduce an iterative two-step method to perform directional failure boosting.

Step 1.:

An estimate, denoted \(\mathbf {\overline{E}}\), of the ‘direction’ of the secret \(\overline{\mathbf {S}}\) in \(\mathbb {Z} _q^{lN}\) is obtained from \(\mathbf {C}_0, \dotsc , \mathbf {C}_{n-1}\).

Step 2.:

The estimate \(\mathbf {\overline{E}}\) is used to inform the search for weak ciphertexts and improve the failure probability prediction for a new ciphertext \(\mathbf {C}_{n}\). One is able to refine the criterion \(P[F \, | \, ct ] \ge f_t\) with computing \(P[F \, | \, ct, \mathbf {\overline{E}} ] \ge f_t\) instead.

Once new failing ciphertexts are found in step 2, one can go back to step 1 and improve the estimate \(\mathbf {\overline{E}}\) and thus bootstrap the search for new failures.

To give an intuition, a two dimensional toy representation can be found in Fig. 1b. Like in the classical failure boosting technique, the red arrow depicts the secret \(\overline{\mathbf {S}}\), while the additional blue arrow marks estimate \(\mathbf {\overline{E}}\) (as calculated in step 1, see Sect. 5.2). Using this estimate, we can refine the acceptance criterion to the depicted ellipse to better reflect our knowledge about the secret (step 2, see Sect. 5.3). Ciphertexts outside this ellipse will be flagged as weak ciphertexts, and while the probability of finding such a ciphertext is the same, the failure probability of weak ciphertexts is now higher. As in, more of the blue zone lies in the dashed area.

5.1 Distributions

We now introduce some probability distributions that will be useful in following sections.

Scaled \(\chi \)-distribution. The scaled \(\chi \)-distribution \(\chi _{n, \sigma }\) is the distribution of the \(\ell _2\)-norm of a vector with n coefficients, each following the normal distribution \(\mathcal {N}_{0,\sigma }\). Denoting with \(\varGamma \) the gamma function, the probability density function of \(\chi _{n, \sigma }\) is given by:

$$\begin{aligned} f_{\chi _{n, \sigma }}(x) = \dfrac{\left( \frac{x}{\sigma }\right) ^{n-1}e^{-\frac{x^2}{2\sigma ^2}}}{2^{\left( \frac{n}{2}-1\right) }\varGamma \left( \frac{n}{2}\right) } \quad \text {for } x\ge 0, \end{aligned}$$
(9)

which has mean [32, §18.3] \(\mathbb {E}_\chi [x] = \sqrt{2}\frac{\varGamma ((n+1)/2)}{\varGamma (n/2)}\sigma \approx \sqrt{n}\sigma \).

We will approximate the probability distribution of \(\Vert \mathbf {x}\Vert _2\) where \(\mathbf {x} \leftarrow \mathcal {D}_{0,\sigma }(R^{l \times 1}_q)\) with a discretized version of the \(\chi _{(l\cdot N),\sigma }\)-distribution, which will be denoted with \(\chi ^D_{(l\cdot N), \sigma }\). Using this distribution, the probability density function of \(\Vert \mathbf {x}\Vert _2\) is calculated as:

$$\begin{aligned} P[\Vert \mathbf {x}\Vert _2 = x] = C \cdot \left( \frac{x}{\sigma }\right) ^{l\cdot N-1}e^{-\frac{x^2}{2\sigma ^2}}\quad \text {for } x \in \left\{ 0,\dots ,\left\lfloor \frac{q}{2}\sqrt{lN} \right\rfloor \right\} , \end{aligned}$$
(10)

with C a normalization constant.

Angle Distribution. The distribution of angles between n-dimensional vectors in \(\mathbb R ^n\) with coefficients drawn from a normal distribution \(\mathcal {N}_{0, \sigma }\) can be modelled using the following probability density function [9]:

$$\begin{aligned} f_{\varTheta _n}(\theta ) = \sin ^{n-2}(\theta ) / \int _0^\pi \sin ^{n-2}(t) dt,\quad \text {for } \theta \in [0,\pi ]. \end{aligned}$$
(11)

Due to the high dimensionality of the vector space used in this paper, vectors will have a very strong tendency towards orthogonality, i.e. \(\theta \) is close to \(\pi /2\), as can be seen in Fig. 2.

Fig. 2.
figure 2

Probability density function (pdf) of the angle between two random vectors in 1536-dimensional space. As the dimension increases, the pdf tends to the Dirac delta function centered at \(\frac{\pi }{2}\).

For computational reasons, we will use a discretized version \(\varTheta ^D_n\) of this distribution to model the distribution of the angles between discrete vectors, if no extra directional information is present. Given a uniformly spaced list of angles between 0 and \(\pi \), we assign to each angle a probability

$$\begin{aligned} P[\theta ] = C \sin ^{n-2}(\theta ) \end{aligned}$$
(12)

with C a normalization constant. The higher the number of angles in this list, the better this distribution approximates the continuous distribution \(\varTheta _n\).

Order Statistics. The maximal order statistic of a distribution X in n dimensions, is the distribution of the maximum of n samples drawn from this distribution. We will denote this distribution with M(Xn). For a discrete distribution X, the probability mass function of M(Xn) can be computed as:

$$\begin{aligned} f_{M(X,n)}(x) = \,&P[x \ge y | y \leftarrow X]^n - P[x > y | y \leftarrow X]^n \end{aligned}$$
(13)
$$\begin{aligned} \approx \,&n \cdot P[x = y | y \leftarrow X] \cdot P[x > y | y \leftarrow X]^{n - 1}, \end{aligned}$$
(14)

where the latter approximation gets better for smaller probabilities.

5.2 Step 1: Estimating the Direction \(\overline{\mathbf {E}}\)

Informally, \(\mathbf {\overline{E}}\) should be a vector that has approximately the same direction as \(\overline{\mathbf {S}}\). Denoting the angle between \(\mathbf {\overline{E}}\) and \(\overline{\mathbf {S}}\) as \(\theta _{ES}\), the bigger \(|\cos (\theta _{ES})|\), the closer our estimate is to \(\pm \overline{\mathbf {S}}\) and the better our estimate of failure probability will be. Since we focus on estimating the direction of \(\overline{\mathbf {S}}\), \(\mathbf {\overline{E}}\) will always be normalized.

In this section, we derive an estimate \(\mathbf {\overline{E}}\) of the direction of the secret \(\overline{\mathbf {S}}\) given \(n\ge 1\) ciphertexts \(\mathbf {C}_0, \dotsc , \mathbf {C}_{n-1}\). Our goal is to find \(\mathbf {\overline{E}}\) such that \(|\cos (\theta _{ES})|\) is as big as possible. We will first discuss the case where the adversary has one ciphertext, then the case where she has two, followed by the more general case where she has n ciphertexts.

One Ciphertext. Assume that a unique failing ciphertext \(\mathbf {C}\) is given. For a failure event \(F_r\), \(\mathbf {\overline{E}} = \overline{\mathbf {C}^{(r)}} / \left\| \overline{\mathbf {C}^{(r)}}\right\| _2\) is a reasonable choice as \(\cos (\theta _{ES})\) is bigger than average. This can be seen as follows:

$$\begin{aligned} \begin{array}{ll} \left| \cos (\theta _{ES})\right| \!\!\!&= \frac{\left| \overline{\mathbf {S}}^T \cdot \overline{\mathbf {E}}\right| }{\left\| \overline{\mathbf {S}}\right\| _2\left\| \overline{\mathbf {E}}\right\| _2}=\frac{\left| \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}^{(r)}}\right| }{\left\| \overline{\mathbf {S}}\right\| _2\left\| \overline{\mathbf {C}^{(r)}}\right\| _2} > \frac{q_t}{\left\| \overline{\mathbf {S}}\right\| _2\left\| \overline{\mathbf {C}^{(r)}}\right\| _2}. \end{array}\end{aligned}$$
(15)

Keep in mind that the cosine of angles between random vectors strongly tend to zero in high dimensional space, so that even a relatively small value of \( \left| \cos (\theta _{ES})\right| \) might be advantageous.

One can argue that it is not possible to compute \( \overline{\mathbf {C}^{(r)}}\) without knowledge of r; whereas in the general case, the failure location is unknown. However, \(\mathbf {\overline{E}} = \overline{\mathbf {C}^{(0)}} / \left\| \overline{\mathbf {C}^{(0)}} \right\| _2\) is an equally good estimate regardless of the value of r. Indeed, \(\overline{\mathbf {C}^{(0)}}\) approximates a rotation of the secret \(\overline{\mathbf {S'}} := \overline{X^{-r}\cdot \mathbf {S}}\) instead of \(\overline{\mathbf {S}}\), which can be seen using the equality \(\overline{\mathbf {A}}^T \cdot \overline{\mathbf {B}}=\overline{X^i\mathbf {A}}^T \cdot \overline{X^i\mathbf {B}}\):

$$\begin{aligned} \begin{array}{ll} \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}^{(r)}}\!\!\!\!&{}=\overline{X^{-r}\cdot \mathbf {S}}^T \cdot \overline{X^{-r}X^r\mathbf {C}^{(0)}}\\ \!\!\!\!&{} = \overline{X^{-r}\cdot \mathbf {S}}^T \cdot \overline{\mathbf {C}^{(0)}}. \end{array} \end{aligned}$$
(16)

Furthermore, multiplicating a polynomial in \(R_q\) with a power of X does not change its infinity norm, as the multiplication only results in the rotation or negation of coefficients. Thus, using an estimate of the direction of \(\overline{X^{-r}\cdot \mathbf {S}}\) is as good as an estimate of the direction of \(\overline{\mathbf {S}}\) when predicting the failure probability of ciphertexts, and we can use \(\mathbf {\overline{E}} = \overline{\mathbf {C}^{(0)}} / \left\| \overline{\mathbf {C}^{(0)}} \right\| _2\).

Two Ciphertexts. Now, assume that two linearly independent failing ciphertexts \(\mathbf {C}_0\) and \(\mathbf {C}_1\), resulting from failure events \(F_{r_0}\) and \(F_{r_1}\) respectively, are given. Taking \(\mathbf {\overline{E}}\) as the normalized version of an average \(\overline{\mathbf {C}}_\mathrm{av} = \left( \overline{\mathbf {C}_0^{(0)}} + \overline{\mathbf {C}_1^{(0)}}\right) /2\) may not necessarily result in a good estimate. For example, if \(\mathbf {C}_0\) comes from a failure event \(F_0\) and \(\mathbf {C}_1\) from a failure event \(F_N\), the two directions cancel each other out as the ciphertexts \(\overline{\mathbf {C}_0^{(0)}}\) and \(\overline{\mathbf {C}_1^{(0)}}\) are in opposite directions.

Kee** the convention that \(\overline{\mathbf {C}_0^{(0)}}\) approximates a rotation of the secret \(\overline{\mathbf {S}'} = \overline{X^{-r_0}\cdot \mathbf {S}}\), we will compute the relative error position \(\delta _{1,0} = r_1-r_0\) and show that is enough to build a correct estimate \(\mathbf {\overline{E}}\) as \( \mathbf {\overline{E}} = \overline{\mathbf {C}_\mathrm{rav}} / \left\| \overline{\mathbf {C}_\mathrm{rav}}\right\| _2\) where:

$$\begin{aligned} \overline{\mathbf {C}_\mathrm{rav}} := \left( \overline{\mathbf {C}_0^{(0)}} + \overline{\mathbf {C}_1^{(\delta _{1,0})}}\right) / 2. \end{aligned}$$
(17)

The reason why such \(\mathbf {\overline{E}}\) is a good estimator of \(\overline{\mathbf {S}'}\) can be seen as follows:

$$ \begin{array}{lll} \cos (\theta _{ES'}) &{}\!\!\!\!=&{}\!\!\! \frac{1}{2\left\| \overline{\mathbf {C_\mathrm{rav}}}\right\| _2\left\| \overline{\mathbf {S'}}\right\| _2} \cdot \left( \overline{X^{-r_0}\cdot \mathbf {S}}^T \cdot \overline{\mathbf {C}_0^{(0)}}+ \overline{X^{-r_0}\cdot \mathbf {S}}^T \cdot \overline{X^{r_1-r_0}\mathbf {C}_1^{(0)}} \right) \\ &{}\!\!\!\!=&{}\!\!\! \frac{1}{2\left\| \overline{\mathbf {C_\mathrm{rav}}}\right\| _2\left\| \overline{\mathbf {S'}}\right\| _2} \cdot \left( \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}_0^{(r_0)}}+ \overline{\mathbf {S}}^T \cdot \overline{\mathbf {C}_1^{(r_1)}} \right) > \frac{q_t}{\left\| \overline{\mathbf {C_\mathrm{rav}}}\right\| _2\left\| \overline{\mathbf {S'}}\right\| _2}. \end{array}$$

Remark 4

In practice ciphertexts with smaller norm will on average be better aligned with the secret, as \(\cos (\theta _{CS'}) > q_t / (\left\| \overline{\mathbf {C}}\right\| _2\left\| \overline{\mathbf {S'}}\right\| _2)\). Therefore they carry more information than ciphertexts with larger norm. To compensate for this effect we will calculate \(\overline{\mathbf {C}_\mathrm{rav}}\) as \(:= \left( \overline{\mathbf {C}_0^{(0)}}/\left\| \overline{\mathbf {C}_0^{(0)}}\right\| _2 + \overline{\mathbf {C}_1^{(\delta _{1,0})}}/\left\| \overline{\mathbf {C}_1^{(\delta _{1,0})}}\right\| _2\right) /2\). While it is possible to further refine the calculation of \(\overline{\mathbf {E}}\) using extra directional information, this heuristic is good enough for our purposes.

Computation of the Relative Position \(\delta _{1,0}\). One can use the fact that both \(\overline{\mathbf {C}_0^{(0)}}\) and \(\overline{\mathbf {C}_1^{(\delta _{1,0})}}\) are expected to be directionally close to \(\overline{\mathbf {S'}}\). Thus, the cosine of the angle between \(\overline{\mathbf {C}_0^{(0)}}\) and \(\overline{\mathbf {C}_1^{(\delta _{1,0})}}\) should be larger than usual. Therefore, \(\delta _{1,0}\) can be estimated with the following distinguisher:

$$\begin{aligned} \delta '_{1,0} := \underset{r\in [0, 2N-1]}{\mathrm {argmax}} \mathcal {C}(r) \text { where } \mathcal {C}(r) := \frac{ \overline{\mathbf {C}^{(0)}_0}^{T} \cdot \overline{\mathbf {C}^{(r)}_1 }}{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2 \left\| \overline{\mathbf {C}^{(r)}_1}\right\| _2 }. \end{aligned}$$
(18)

The next paragraph estimates the efficiency of using Eq. 18 as a distinguisher for deriving \(\delta _{1,0}\). We will show that, for Table 1 parameters, we expect

$$\begin{aligned} P[\delta '_{1,0} =\delta _{1,0}] \approx 89\%. \end{aligned}$$
(19)

Experiments run by simulating the sampling \(10^4\) failing ciphertexts (refer to the full version of our paper [15] for the generation technique), and using Eq. 18 for finding \(\delta _{1,0}\) between pairs of them, return \(P_\mathrm{Exp}[\delta '_{1,0} =\delta _{1,0}] \approx 84.8\%\), in sufficiently good agreement.

To obtain the value (19), the idea is to estimate the distribution of a correct guess \(\mathcal {C}(\delta _{1,0})\) and an incorrect guess \(\max _{r \ne \delta _{1,0}} \mathcal {C}(r)\) and quantify the discrepancy. First, we decompose the ciphertexts in a component parallel to \(\overline{\mathbf {S'}}\), denoted with \(\parallel \), and a component orthogonal, denoted with \(\perp \), we rewrite \(\mathcal {C}(r)\) as follows:

$$\begin{aligned} \mathcal {C}(r)&= \frac{ \overline{\mathbf {C}^{(0)}_{0, \parallel }} \cdot \overline{\mathbf {C}^{(r)}_{1, \parallel }}+ \overline{\mathbf {C}^{(0)}_{0, \perp }} \cdot \overline{\mathbf {C}^{(r)}_{1, \perp }} }{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2 \left\| \overline{\mathbf {C}^{(r)}_1}\right\| _2 } \end{aligned}$$
(20)

In the first term, the scalar product of two parallel elements equals the product of their norms (up to their sign). For the second term, we apply the scalar product definition and intoduce \(t\) as the angle between \(\overline{\mathbf {C}^{(0)}_{0, \perp }}\) and \(\overline{\mathbf {C}^{(r)}_{1, \perp }}\).

$$\begin{aligned} \mathcal {C}(r)&= \pm \frac{ \left\| \overline{\mathbf {C}^{(0)}_{0, \parallel }}\right\| _2}{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2 } \cdot \frac{\left\| \overline{\mathbf {C}^{(r)}_{1, \parallel }}\right\| _2}{\left\| \overline{\mathbf {C}^{(r)}_1}\right\| _2 } \pm \frac{\left\| \overline{\mathbf {C}^{(0)}_{0, \perp }}\right\| _2}{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2} \cdot \frac{\left\| \overline{\mathbf {C}^{(r)}_{1, \perp }}\right\| _2 }{\left\| \overline{\mathbf {C}^{(r)}_1}\right\| _2 } \cdot \cos (t) \end{aligned}$$
(21)
$$\begin{aligned}&= \cos \left( \theta _{S'C^{(0)}_0}\right) \cos \left( \theta _{S'C^{(r)}_1}\right) + \sin \left( \theta _{S'C^{(0)}_0}\right) \sin \left( \theta _{S'C^{(r)}_1}\right) \cos (t) \end{aligned}$$
(22)

The vectors \(\overline{\mathbf {C}^{(0)}_{0, \perp }}\) and \(\overline{\mathbf {C}^{(r)}_{1, \perp }}\) are orthogonal to \(\overline{\mathbf {S'}}\). This means that they live in the \(2Nl-1\) dimensional space orthogonal to \(\overline{\mathbf {S'}}\). The high dimension of the space will strongly drive the vectors towards orthogonality as can be seen in Fig. 2. Using Assumption 1, the angle \(t\) between \(\overline{\mathbf {C}^{(0)}_{0, \perp }}\) and \(\overline{\mathbf {C}^{(r)}_{1, \perp }}\) is then assumed to follow the distribution of random angles between vectors in a \(2Nl-1\) dimensional space (See Eq. 11).

Now, let us study the distribution of \(\mathcal {C}(r)\) depending of the value \(r\in [0, 2N -1]\). One can refer to Fig. 3 for a graphical interpretation based on the parameters of Table 1.

  • If \(r = \delta _{1,0}\), the expected value of \(\mathcal {C}(r)\) will be higher than average. Indeed, by definition of \(F_{r_1}\) and \(F_{r_0}\) the cosines forming the first term are positive. The distribution of \(\mathcal {C}(r)\) can then be estimated using Eq. 22 (blue curve).

  • If \(r = \delta _{1,0} + N \mod 2N\), the distribution of \(\mathcal {C}(r)\) is equal to the distribution of \(-\mathcal {C}(\delta _{1,0})\) and will be closer to \(-1\) (orange curve).

  • If \(r \ne \delta _{1,0} \mod N\), \(\mathcal {C}(r)\) can be assumed to follow the distribution of random angles in a 2Nl dimensional space \(\varTheta _{2Nl}\), as given in Eq. 11 (green curve).

  • The pdf of \(\max _{r \ne \delta _{1,0}} \mathcal {C}(r)\) is then calculated as \(M(\varTheta _{2Nl}, 2N-1)\) by definition of the maximal order statistic (red curve).

Fig. 3.
figure 3

Distributions used for finding \(\delta _{1,0}\) (Color figure online)

Figure 3 assembles the probability density functions of the above distributions in a plot. The probability of selecting the correct \(\delta '_{1,0}\) using \(\underset{r\in [0, 2N-1]}{\mathrm {argmax}} \mathcal {C}(r)\), can then be computed as:

$$ P[\delta '_{1,0} =\delta _{1,0}] = P\left[ \max _{r \ne \delta _{1,0}} \mathcal {C}(r) < \mathcal {C}(\delta _{1,0})\right] . $$

For our toy scheme’s parameters, this results in Eq. 19.

Multiple Ciphertexts. In this section, we assume that n linearly independent failing ciphertexts \(\mathbf {C}_0, \dots , \mathbf {C}_{n-1}\), resulting from failure events \(F_{r_0}, \dots , F_{r_{n-1}}\) respectively, are given. We introduce a generalized method to recover the relative positions \(\delta _{1,0}, \dots , \delta _{n-1,0}\), based on “loopy belief propagation” [41]. Once these relative positions are found, they can be combined in an estimate \(\overline{\mathbf {E}}\) with \(\mathbf {\overline{E}} = \overline{\mathbf {C}_\mathrm{rav}} / \left\| \overline{\mathbf {C}_\mathrm{rav}}\right\| _2\) where

$$\begin{aligned} \overline{\mathbf {C}_\mathrm{rav}} := \left( \overline{\mathbf {C}_0^{(0)}} / \left\| \overline{\mathbf {C}_0^{(0)}}\right\| _2 + \sum _{i \in [1,n-1]}\overline{\mathbf {C}_i^{(\delta _{i,0})}} / \left\| \overline{\mathbf {C}_i^{(\delta _{i,0})}}\right\| _2\right) /n. \end{aligned}$$
(23)

To find the correct rotations, we construct a weighted graph that models the probability of different relative rotations, and we will use loopy belief propagation to obtain the most probable set of these rotations:

  • The nodes represent the obtained failing ciphertexts: \((\mathbf {C}_i)_{i\in [0,n-1]}\). In total, there are n nodes.

  • Each node \(\mathbf {C}_i\) where \(i\ne 0\) is associated a list with 2N probability values called beliefs and denoted \((b_i(0), \cdots , b_i(2N-1))\) that together define a probability distribution over \([0, 2N-1]\). The \(r^\text {th}\) item in the list represents our belief that the correct relative position \(\delta _{0,1}\) equals r. The correct rotation of the \(0^\text {th}\) node will be fixed to 0 (i.e. \(b_0(0) = 1\) and \(b_0(i) = 0\) for all other i) as only the relative rotations of the ciphertexts is important. These node weights are initialized as follows:

    $$ b_i(r) := P\left[ \delta _{i,0} = r \right] \quad \left( = P\left[ F_{r} \text { for }\mathbf {C}_i | F_{0} \text { for }\mathbf {C}_0 \right] \right) $$
  • For two nodes \(\mathbf {C}_i\) and \(\mathbf {C}_j\), the value of the vertex called message, models the influence of the beliefs in the rotations s of node j towards the beliefs in rotation r of node i, which can be formalized as follows:

    $$ m_{i,j}(r, s) := P\left[ \delta _{i,j} = r - s \right] \quad \left( = P\left[ F_{r-s} \text { for }\mathbf {C}_i | F_{0} \text { for }\mathbf {C}_j \right] \right) $$
Fig. 4.
figure 4

Example of the graph for finding the relative rotations where \(N=3\) and \(n=3\). The beliefs are in the rectangles, the circles represent the nodes and some messages are represented between the nodes. (Color figure online)

Loopy belief propagation tries to find the node values r for each node, so that the probabilities over the whole graph are maximized. This is done in an iterative fashion by updating the node beliefs according to the messages coming from all other nodes. Our goal is eventually to find \(r = \delta _{i,0}\) for each node i.

Example 2

For example, with \(N=3\) and \(n=3\), the graph contains the nodes \(\mathbf {C}_0\), \(\mathbf {C}_1\), and \(\mathbf {C}_2\). In Fig. 4, we represent how such a graph could look like where we arbitrarly instantiate the messages and beliefs. We can see that if one chooses the \(r_i = \mathrm {argmax}_r b_i(r)\) for each node, one would have chosen \(r_1 = 1\) and \(r_2 = 3\). Nevertheless, we notice that the influence of the other probabilities allows for a better choice (underlined in blue in the figure): \(r_1 = 2, r_2 = 3\).

Vertex Probabilities. As discussed, the edge between two nodes \(\mathbf {C}_i\) with rotation r and \(\mathbf {C}_j\) with rotation s is weighted with \(P\left[ \delta _{i,j} = r - s \right] \). This probability can be computed using a generalization of the distinguisher technique used for two ciphertexts as detailed in the full version of our paper [15].

figure d
Table 2. Probability of finding the correct relative rotations and thus building the correct estimate \(\mathbf {\overline{E}}\) with the knowledge of 2, 3, 4 and 5 failing ciphertexts.

Loopy Belief Propagation. This technique looks for the best set \((r_1, \dots , r_{n-1})\) by iteratively correcting the beliefs using messages from other nodes. This procedure is detailed in Algorithm 6, where \(\mathtt {normalize}(f)\) normalizes the list b() so that \(\sum _{x\in supp(b)} b(x) = 1\). In each iteration, the belief of each node \(\mathbf {C}_i\) is updated according to the messages of the other nodes \(\mathbf {C}_j\). For each i the belief is updated as follows:

$$\begin{aligned} b_i(r) = \prod _{j=0, j \ne i}^n \mathsf {infl}_{ji}(r) \end{aligned}$$
(24)

where \(\mathsf {infl}_{ji}(r)\) captures the influence of the value of the node \(\mathbf {C}_j\) to node \(\mathbf {C}_i\). This influence can be calculated as \(\mathsf {infl}_{ji}(r) \leftarrow C \sum _{x} m_{i,j}(r, x) \cdot {b_j(x)}\), with C as normalizing constant.

Experimental Verification. With Table 1 parameters, we obtained the correct values \(r_i = \delta _{i,0}\) for all \(i \in [1, n-1]\) after 3 iterations with the probabilities as reported in Table 2, by generating 1000 times each number of ciphertexts and trying to find the correct values of the \(r_i\).

Remark 5 (Consistency with the previous section)

Note that this procedure also incorporates the setting where one has only 2 failing ciphertexts, which would yield exactly the same results as in the previous paragraph.

Finally, once all the rotations are found, recall that the estimate is obtained by \(\mathbf {\overline{E}} = \overline{\mathbf {C}_\mathrm{rav}} / \left\| \overline{\mathbf {C}_\mathrm{rav}}\right\| _2\) where

$$\begin{aligned} \overline{\mathbf {C}_\mathrm{rav}} = \left( \overline{\mathbf {C}_0^{(0)}}/\left\| \overline{\mathbf {C}_0^{(0)}}\right\| _2 + \sum _{i \in [1,n-1]}\overline{\mathbf {C}_i^{(r_i)}}/\left\| \overline{\mathbf {C}_i^{(r_i)}}\right\| _2\right) /n. \end{aligned}$$
(25)

5.3 Step 2: Finding Weak Ciphertexts

In this section, we are given an estimate \(\mathbf {\overline{E}}\) and we refine the acceptance criterion. Instead of accepting if \(P[F \, | \, ct ] \ge f_t\), our condition is slightly changed.

  1. 1.

    Generate a key encapsulation \(ct=(\mathbf {C}, \mathbf {G})\) with derived key K.

  2. 2.

    If \(P[F \, | \, \overline{\mathbf {E}}, ct ] \ge f_t\), keep ct in a weak ciphertext list, otherwise go to to Step 1.

In Step 2, \(P[F \, | \, \overline{\mathbf {E}}, ct ]\) is defined as the failure probability, given a certain ciphertext ct and a certain estimate \(\overline{\mathbf {E}}\). In the following, we explain a way to compute it.

First, for \(r\in [0,2N-1]\), we will estimate the probability that a ciphertext leads to an error in the \(r^{\text {th}}\) location. Decomposing the vectors \(\overline{\mathbf {S}}\) and \(\overline{\mathbf {C}}\) in a component orthogonal to \(\mathbf {\overline{E}}\), denoted with subscript \(\perp \), and a component parallel to \(\mathbf {\overline{E}}\), denoted with subscript \(\parallel \), we obtain the failure expression:

$$\begin{aligned} P[F_r \, | \, \mathbf {\overline{E}}, \mathbf {C} ] =\,&P[\overline{\mathbf {S}}^T\cdot \overline{\mathbf {C}^{(r)}}> q_t \, | \, \mathbf {\overline{E}}, \mathbf {C}] = P[\overline{\mathbf {S}_\parallel }^T \cdot \overline{\mathbf {C}^{(r)}_\parallel } + \overline{\mathbf {S}_\perp }^T \cdot \overline{\mathbf {C}^{(r)}_\perp }> q_t \, | \, \mathbf {\overline{E}}, \mathbf {C}] \\ = \,&P\,\left[ \left( \begin{matrix}\left\| \overline{\mathbf {S}}\right\| _2 \left\| \overline{\mathbf {C}^{(r)}}\right\| _2 \cos (\theta _{SE}) \cos (\theta _{C^{(r)}E})\ + \\ \Vert \overline{\mathbf {S}}\Vert _2 \left\| \overline{\mathbf {C}^{(r)}}\right\| _2 \sin (\theta _{SE}) \sin (\theta _{C^{(r)}E}) \cos (t) \end{matrix}\right)> q_t \, | \, \mathbf {\overline{E}}, \mathbf {C}\right] \\ = \,&P\,\left[ \cos (t) > \frac{q_t - \left\| \overline{\mathbf {S}}\right\| _2 \left\| \overline{\mathbf {C}^{(r)}}\right\| _2 \cos (\theta _{SE}) \cos (\theta _{C^rE})}{\left\| \overline{\mathbf {S}}\right\| _2 \left\| \overline{\mathbf {C}^{(r)}}\right\| _2 \sin (\theta _{SE}) \sin (\theta _{C^rE})} \, | \, \mathbf {\overline{E}}, \mathbf {C}\right] \end{aligned}$$

where \(\theta _{SE}\) and \(\theta _{C^{(r)}E}\) are the angles of \(\overline{\mathbf {S}}\) and \(\overline{\mathbf {C}^{(r)}}\) with the estimate \(\mathbf {\overline{E}}\) respectively, and where \(t\) is the angle between \(\overline{\mathbf {S}}_\perp \) and \(\overline{\mathbf {C}^{(r)}_\perp }\). We assume no other knowledge about the direction of the secret apart from the directional estimate \(\mathbf {\overline{E}}\). In this case, using Assumption 1, \(t\) can be estimated as a uniform angle in a \(2Nl - 1\) dimensional space. Then \(t\) is assumed to follow the probability distribution \(\varTheta _{2Nl - 1}\) (defined in Eq. 11).

The values \(\mathbf {\overline{E}}\), \(\Vert \mathbf {C}\Vert _2\) and \(\cos (\theta _{C^{(r)}E})\) are known, meanwhile the values \(\Vert \mathbf {S}\Vert \) and \(\mathbf {\theta }_{SE}\) can be modelled using their probability distribution. Thus, we can approximate \(P[F_i | \, \mathbf {\overline{E}}, \mathbf {C} ]\) with \(P[F_i \, | \, \mathbf {\overline{E}}, \Vert \mathbf {\overline{C}}\Vert _2, \cos (\theta _{C^{(r)}E})]\).

Assumption 2

We assume that failures at different locations are independent.

Assumption 2 is a valid assumption for schemes without error correcting codes, as discussed in [17]. We can then calculate the failure probability of a certain ciphertext as:

$$\begin{aligned} P[F \, | \, \mathbf {\overline{E}}, \mathbf {C} ] = 1- \prod ^{2N}_{r=0} \left( 1-P[F_r \, | \, \mathbf {\overline{E}}, \Vert \mathbf {\overline{C}}\Vert _2, \cos (\theta _{C^{(r)}E})] \right) \end{aligned}$$
(26)

As this expression gives us a better prediction of the failure probability of ciphertexts by using the information embedded in \(\mathbf {\overline{E}}\), we can more accurately (Grover) search for weak ciphertexts and thus reduce the work to find the next decryption failure. Moreover, the better \(\mathbf {\overline{E}}\) approximates the direction of \(\mathbf {\overline{S}}\), the easier it becomes to find a new decryption failure.

5.4 Finalizing the Attack with Lattice Reduction

Once multiple failures are found, the secret key can be recovered with lattice reduction techniques as presented in [17, §4] and in [27, Step 3 of the attack]. The following Section simply outlines how their technique transposes to our framework. As shown in Sect. 5, an estimate \(\mathbf {\overline{E}}\) of the direction of a rotated version of \(\overline{\mathbf {S}'} = \overline{X^r\mathbf {S}}\) with an unknown value r is provided. Therefore, similarly to [27], an attacker can obtain an estimation of \(\overline{\mathbf {S}'}\) (and not only its direction) by rescaling

$$\begin{aligned} \mathbf {\overline{E'}}:= \mathbf {\overline{E}}\cdot nq_{t} \cdot \left( \left\| \overline{\mathbf {C}_0^{(0)}} + \sum _{i \in [1,n-1]}\overline{\mathbf {C}_i^{(r_i)}} \right\| _2 \right) ^{-1}, \end{aligned}$$

using the approximation \(\mathbf {\overline{E'}}^T \cdot 1/n\left( \overline{\mathbf {C}_0^{(0)}} + \sum _{i \in [1,n-1]}\overline{\mathbf {C}_i^{(r_i)}}\right) \approx q_t\).

Then, for each possible \(r\in [0, 2N-1]\), an attacker can perform lattice reduction and recover candidates for \(\mathbf {s}, \mathbf {e}\) that are accepted if they verify \(\mathbf {b} = \mathbf {As} + \mathbf {e}\). One caveat is that an attacker may have to run a lattice reduction up to 2N times. Since \(\mathbf {\overline{E'}} - \overline{\mathbf {S}'}\) is small, the attacker can construct an appropriate lattice basis encoding \(\mathbf {\overline{E'}} - \overline{\mathbf {S}'}\) as a unique shortest target vector, and solves the corresponding Unique-SVP problem with the BKZ algorithm [1, 3, 11, 45]. The block size of BKZ will depend on the accuracy of the estimate \(\mathbf {\overline{E}}\). Indeed, the standard deviation of \(\mathbf {\overline{E'}}_i - \overline{\mathbf {S}'}_i \) is of the order of \(\sigma _s \cdot \sin (\theta _{S'E})\) (assuming that \(\theta _{S'E}\) is small and \(\Vert \overline{\mathbf {S}'}\Vert _2 \approx \Vert \overline{\mathbf {E}'}\Vert _2\)). Thus, when many decryption failures are available, \(\sin (\theta _{S'E})\) gets very small and the complexity of this step is dominated by the work required for constructing \(\mathbf {\overline{E}}\). For example, in the case of our toy scheme, if \(\cos (\theta _{S'E}) > 0.985\), using [2], the BKZ block size becomes lower than 363 which leads to less than \(2^{100}\) quantum work (in the Core-SVP [3] \(0.265\beta \) model). As we will see in Sect. 6.3, this is less than the work required to find the first failure.

Remark 6

One can think that the failures obtained by directional failure boosting will not be totally independent. It is true that the failing ciphertexts are roughly following the same direction. But applying our Assumption 1, in high dimensions, for a reasonable number n of failures (\(n\ll 2lN\)), the hypercone in which the failures belong is large enough that linear dependency will happen with very low probability.

6 Efficiency of Directional Failure Boosting

In this section, we experimentally verify the efficiency of the directional failure boosting technique. We first quantify the accuracy of the estimate \(\overline{\mathbf {E}}\) computed according to Sect. 5.2. We then derive the necessary work required to run the directional failure boosting technique and the optimal number of queries. For the rest of the section, we focus on minimizing the total work for finding failures and we will assume there is no upper limit to the number of decryption queries.

Our key takeaway is that, for Table 1 parameters, the more failing ciphertexts have been found, the easier it becomes to obtain the next one, and that most of the effort is concentrated in finding the first failure. The final work and query overheads are stored in Table 4.

6.1 Accuracy of the Estimate

Let \(\mathbf {C}_0, ..., \mathbf {C}_{n-1}\) be n previously found failing ciphertexts and we take the estimate defined according to Eq. 25. Similarly to Sect. 5.2, we define \(\overline{\mathbf {S}'} = \overline{X^{-r_0}\cdot \mathbf {S}}\) as the secret vector for an unknown \(F_{r_0}\). To estimate the accuracy of \(\mathbf {\overline{E}}\), we compute \(\cos (\theta _{S'E}) = \frac{\overline{\mathbf {S'}}^T \cdot \mathbf {\overline{E}}}{\left\| \overline{\mathbf {S'}}\right\| _2}= \frac{\overline{\mathbf {S'}}^T \cdot \overline{\mathbf {C}_\mathrm{rav}}}{\left\| \overline{\mathbf {S'}}\right\| _2 \left\| \overline{\mathbf {C}_\mathrm{rav}}\right\| _2}\) as

$$\begin{aligned} \cos (\theta _{S'E})&= \frac{\overline{\mathbf {S'}}^T \cdot \left( \frac{\overline{\mathbf {C}^{(0)}_0}}{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2} + \sum _{i=1}^{n-1} \frac{\overline{\mathbf {C}^{(r_i)}_i}}{\left\| \overline{\mathbf {C}^{(r_i)}_i}\right\| _2} \right) }{n\left\| \overline{\mathbf {S'}}\right\| _2\left\| \overline{\mathbf {C}_\mathrm{rav}}\right\| _2} \end{aligned}$$
(27)
$$\begin{aligned}&= \frac{\cos \left( \theta _{C_0^{(0)}S'}\right) + \sum _{i=1}^{n-1} \cos \left( \theta _{C_i^{(r_i)}S'}\right) }{\left\| \frac{\overline{\mathbf {C}^{(0)}_0}}{\left\| \overline{\mathbf {C}^{(0)}_0}\right\| _2} + \sum _{i=1}^{n-1} \frac{\overline{\mathbf {C}^{(r_i)}_i}}{\left\| \overline{\mathbf {C}^{(r_i)}_i}\right\| _2}\right\| _2} \end{aligned}$$
(28)

First, we make the following approximation.

Approximation 1

We approximate the cosine with the secret \(\overline{\mathbf {S}'}\) by its expected value denoted \(\cos (\theta _{CS'}) := \mathbb {E}\left[ \cos \left( \theta _{C_i^{(r_i)}S'}\right) \right] \). In other words, for all \(i\in [1, n-1]\) we assume

$$\begin{aligned} \cos (\theta _{CS'})=\cos \left( \theta _{C_i^{(r_i)}S'}\right) = \cos \left( \theta _{C_0^{(0)}S'}\right) . \end{aligned}$$

To estimate the denominator of Eq. 28, we split the ciphertexts in a component parallel to the secret \(\overline{\mathbf {C}^{(r_i)}_{i,\parallel }}\) and a component orthogonal \(\overline{\mathbf {C}^{(r_i)}_{i,\perp }}\) to the secret. Following Assumption 1, we will assume orthogonality between the various \(\overline{\mathbf {C}^{(r_i)}_{i,\perp }}\). As the norm of the sum of parallel vectors is the sum of their norm, and the norm of the sum of orthogonal vectors can be calculated using Pythagoras’ theorem, we can approximate \(\cos (\theta _{S'E})\) as follows:

$$\begin{aligned} cos(\theta _{S'E})&\approx \frac{n \cos (\theta _{CS'})}{\sqrt{n^2\cos (\theta _{CS'})^2 + n \sin (\theta _{CS'})^2}} = \frac{\cos (\theta _{CS'})}{\sqrt{\cos (\theta _{CS'})^2 + \sin (\theta _{CS'})^2/n}} \end{aligned}$$
(29)

One can see from this equation that \(cos(\theta _{S'E})\) gets closer to 1 when n increases.

Experimental Verification. The first line of Table 3 gives the expected values of \(\cos (\theta _{S'E})\) for various n, according to Eq. 29, with \(\cos (\theta _{CS'})\) set to \(q_t / \mathbb {E}[\Vert \overline{\mathbf {S}}\Vert ] \mathbb {E}[\Vert \overline{\mathbf {C}^{(0)}}\Vert ]\), which is a good approximation of \(\cos (\theta _{CS'})\) as \(\cos (\theta _{CS'}) > q_t / \Vert \overline{\mathbf {S}}\Vert \Vert \overline{\mathbf {C}^{(0)}}\Vert \) and because angles tend to orthogonality in high dimensional space.

Then, to verify the theory, we implemented a method to simulate the distribution of random failing ciphertexts. This technique is described in the full version of our paper [15]. Once the simulated failing ciphertexts are found, we combine them to build \(\overline{\mathbf {E}}\) using their correct rotations, and we compute \(\cos (\theta _{S'E})\). The latter experiment was repeated 100 times and the average values are reported in line two of Table 3.

Table 3. Accuracy of the estimate derived from several failures. Expected value of \(\cos (\theta _{S'E})\) according to Eq. 29. The closer to 1, the more accurate \(\overline{\mathbf {E}}\) is.

6.2 Estimating \(\alpha _{i, f_t}\) and \(\beta _{i, f_t}\)

To estimate the effectiveness of directional failure boosting given a certain number i of previously collected failing ciphertexts, we need to find the optimal weak ciphertext threshold \(f_t\) for each i. This corresponds to considering how much time to spend for one precalculation \(\sqrt{\alpha _{i, f_t}^{-1}}\) and the average failure probability of weak ciphertexts \(\beta _{i, f_t}\) after the precalculation. Let us recall the definition of \(\alpha _{n, f_t}\) and \(\beta _{n, f_t}\), derived from Eqs. 7 and 8, where \(\mathbf {C}_0, ..., \mathbf {C}_{n-1}\) are the n previously found failing ciphertexts.

$$\begin{aligned} \alpha _{i, f_t} = \sum _{\forall ct: P[ F | ct, \mathbf {C}_0, ..., \mathbf {C}_{n-1} ] > f_t} P[ct] \end{aligned}$$
(30)
$$\begin{aligned} \beta _{i, f_t} = \frac{\sum _{\forall ct: P[ F | ct, \mathbf {C}_0, ..., \mathbf {C}_{n-1} ]> f_t} P[ct] \cdot P[ F | ct, \mathbf {C}_0, ..., \mathbf {C}_{n-1} ]> f_t] }{\sum _{\forall ct: P[ F | ct, \mathbf {C}_0, ..., \mathbf {C}_{n-1} ] > f_t} P[ct]}. \end{aligned}$$
(31)

To find the optimal values, we need to calculate Eqs. 30 and 31 as functions of \(f_t\). This requires us to list the probability of all ciphertexts \(ct := (\mathbf {C}, \mathbf {G})\), and their failure probability \(P[F | ct, \mathbf {C}_0, ..., \mathbf {C}_{t-1}]\). As discussed in [16], exhaustively computing both values is not practically feasible, and therefore we will make some assumptions to get an estimate.

A first simplification is to cluster ciphertexts that have similar \(\Vert \mathbf {C}\Vert _2\) and \(|\theta _{C^{(0)}E}| \cdots |\theta _{C^{(N-1)}E}|\) and thus a similar failure probability. To further reduce the list size, we only take into account the largest value of \(|\cos (\theta _{C^{(i)}E})|\) denoted

$$\begin{aligned} \mathrm {maxcos}(\theta _{CE}):= \max _i(|\cos (\theta _{C^{(i)}E})|, \end{aligned}$$

which results in a slight underestimation of the effectiveness of the attack. In other words,

$$\begin{aligned} \begin{array}{rll} P[ct] &{}\!\!\!\text { becomes } &{}\!\!\!P\left[ \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] ,\\ P[ F | ct, \mathbf {C}_0, ..., \mathbf {C}_{n-1} ] &{}\!\!\!\text { becomes }&{}\!\!\!P\left[ F \, | \, \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] .\\ \end{array} \end{aligned}$$

Assuming independence between the norm of \(\mathbf {C}\) and its angle with \(\mathbf {\overline{E}}\), \(P\left[ \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE}))\right] \) can be estimated using the distributions defined with Eqs. 10 and 13 as follows:

$$\begin{aligned} P\left[ \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] = \underbrace{P[\Vert \mathbf {C}\Vert _2]}_{\chi _{N l, \sigma }}\cdot \underbrace{P[\mathrm {maxcos}(\theta _{CE})]}_{M(\varTheta _{2 N l}, 2N)}. \end{aligned}$$
(32)

Denoting with r the position of the maximum angle, we can rewrite \(P\left[ F \, | \, \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] \) as follows:

$$\begin{aligned}&P\left[ F \, | \, \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] = 1 - \prod _i \Big ( 1- P\big [F_i \, | \, \, \Vert \mathbf {C}\Vert _2, |\cos (\theta _{C^{(r)}E})|\big ] \Big ), \end{aligned}$$
(33)
$$\begin{aligned}&= 1 - \left( \begin{array}{l} \Big ( 1- P\big [F_r \, | \, \, \Vert \mathbf {C}\Vert _2, |\cos (\theta _{C^{(r)}E})|\big ] \Big ) \cdot \\ \prod _{i\ne r} \Big ( 1- P\big [F_{i} \, | \, \, \Vert \mathbf {C}\Vert _2, |\cos (\theta _{C^{(i)}E})| \le |\cos (\theta _{C^{(r)}E})|\big ] \Big ) \end{array} \right) , \end{aligned}$$
(34)

where \(1- P\big [F_r \, | \, \, \Vert \mathbf {C}\Vert _2, |\cos (\theta _{C^{(r)}E})|\big ]\) can be estimated using Eq. 26, and \(P\big [F_i \, | \, \, \Vert \mathbf {C}\Vert _2, |\cos (\theta _{C^{(i)}E})| \le |\cos (\theta _{C^{(r)}E})|\big ]\) using an integral over Eq. 26. The estimated failure probability of ciphertexts given \(\Vert \mathbf {C}\Vert _2\) and \(\cos (\theta _{CE})\) for the parameters listed in Table 1 is depicted in Fig. 5a.

Verification Experiment. We verified these results experimentally by generating \(5 \cdot 10^6\) failing ciphertexts and \(5 \cdot 10^6\) successful ciphertexts, and calculating their norm and angle with 1000 estimates, or in this case other ciphertexts. The failing ciphertexts were produced using the methodology detailed in the full version of our paper [15]. Once they are generated, we estimate their failure probability with a procedure also detailed in the full version of our paper [15]. We combined these results into Fig. 5b. These experimental results confirm our theoretical estimates given in Fig. 5a.

Fig. 5.
figure 5

Failure probability of ciphertexts as a function of \(\Vert \mathbf {C}\Vert _2\) and \(\cos (\theta _{CE})\). A zoomed version of (a) for easier comparison can be found the full version of our paper [15].

With the estimation of \(P\left[ F \, | \, \Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})\right] \) and \(P[\Vert \mathbf {C}\Vert _2, \mathrm {maxcos}(\theta _{CE})]\), \(\alpha _{i,f_t}\) and \(\beta _{i,f_t}\) can be estimated as functions of i and \(f_t\). Let us now define the optimal threshold \(f_t\) as a function of i as :

$$\begin{aligned} f_i := \mathrm{argmin} _{f_t} \ \left( \sqrt{\alpha _{i,f_t}} \cdot \beta _{i,f_t}\right) ^{-1}. \end{aligned}$$

6.3 Total Amount of Work and Queries

In this section, we will derive the optimal work and queries for an adversary to perform, in order to obtain n ciphertexts with probability \(1-e^{-1}\). We introduce the following notation: to find the \((i+1)^{\text {th}}\) ciphertext, the adversary performs \(Q_i\) queries. Using a Poisson distribution, the success probability of finding the \((i+1)^{\text {th}}\) ciphertext in \(Q_i\) queries is \(1-e^{-Q_i\beta _{i, f_i}}\). The probability of obtaining n failures can then be calculated as the product of the success probabilities of finding ciphertexts 0 to \(n-1\):

$$\begin{aligned} P_n = \prod _{i=0}^{n-1}(1-e^{-Q_i\beta _{i, f_i}}). \end{aligned}$$
(35)

This is a slight underestimation of the success probability of the attack, because if an adversary finds a failing ciphertext in less than \(Q_i\) samples, she can query more ciphertexts in the next stages \(i+1, \dotsc , n\). However, this effect is small due to the large value of \(Q_i\).

The total amount of precomputation quantum work, and the total amount of queries to obtain the n failing ciphertexts by performing \(Q_i\) tries for each ciphertext, can be expressed as

$$\begin{aligned} \mathcal {W}^\mathrm{tot}_n := \sum _{i=0}^{n-1} \underbrace{\frac{Q_i}{\sqrt{\alpha _{i, f_i}}}}_{:= W_i}\quad \quad \quad \mathcal {Q}^\mathrm{tot}_n := \sum _{i=0}^{n-1} Q_i. \end{aligned}$$
(36)

Recall that for now we assume there is no upper limit to the number of decryption queries that can be made, and we focus on minimizing the amount of work. The values of \(Q_i\) that minimizes the total quantum work \(\mathcal {W}^\mathrm{tot}_n\) can be found using the following Lagrange multiplier, minimizing the total amount of work to find n failures with probability \(1-e^{-1}\) using the above probability model:

$$\begin{aligned} L(Q_0, \cdots , Q_{n-1}, \lambda ) = \sum _{t=0}^{n-1} \frac{Q_i}{\sqrt{\alpha _{i, f_i}}} + \lambda \left( (1-e^{-1}) - \prod _{i=0}^{n-1}(1-e^{-Q_i\beta _{i, f_i}}) \right) \end{aligned}$$
(37)

By equating the partial derivative of L in \(Q_0, \cdots , Q_{n-1}\) and \(\lambda \) to zero and solving the resulting system of equations, we obtain the optimal values of \(Q_0, \cdots , Q_{n-1}\) to mount our attack.

The resulting total work and queries of obtaining n ciphertext using directional failure boosting are given in Table 4 and Fig. 6. One can see that the majority of the work lies in obtaining the first ciphertext, and that obtaining more than one ciphertext can be done in less than double the work and queries, or less than one extra bit of complexity. For schemes with a lower failure probability, failing ciphertexts will be more correlated to the secret, so that the directional information is higher and directional failure boosting will be more effective.

In conclusion, the security of a scheme with low failure probability under a single target decryption failure attack can be approximated by the amount of work and queries that an adversary needs to do in order to obtain the first decryption failure. We emphasize the fact that obtaining many failures for a low overhead threatens the security of the scheme (See Sect. 5.4).

Table 4. Quantum work \(\mathcal {W}^\mathrm{tot}_n\) and queries \(\mathcal {Q}^\mathrm{tot}_{n}\) required to find n failing ciphertexts with probability \(1-e^{-1}\). Finding the first ciphertext requires the heaviest amount of computation. After the third failing ciphertext is found, the following ones are essentially for free.
Fig. 6.
figure 6

Quantum work \(W_i\) and number of decryption queries \(Q_i\) required to find a new failing ciphertext, given the i failing ciphertexts found previously.

7 Discussion and Variants

7.1 Comparison with D’Anvers et al. [16]

In Fig. 7, the total work and queries needed to obtain n ciphertexts with probability \(1-e^{-1}\) is plotted for both the traditional failure boosting, and our directional failure boosting approach. For a fair comparison between both results, we adapted the method for estimating the total work and queries with success probability \(1-e^{-1}\) using the traditional failure boosting of [16]. For more information about our method, we refer to the full version of our paper [15].

Fig. 7.
figure 7

Quantum work \(\mathcal {W}^\mathrm{tot}_n\) and number of decryption queries \(\mathcal {Q}^\mathrm{tot}_n\) required to obtain n failing ciphertexts with probability \(1-e^{-1}\), given the number of previously found failing ciphertexts.

7.2 Minimizing the Number of Queries Instead

In case there is a maximal number of decryption queries is imposed, say \(2^K\), the same attack strategy can be followed. However, to limit the number of queries \(\mathcal {Q}^\mathrm{tot}_n\) necessary in the attack, a stronger preprocessing \(\sqrt{\alpha _{i, f_t}^{-1}}\) might be necessary to increase the failure probability \(\beta _{i, f_t}\) of weak ciphertexts over \(2^{-K}\). The only change to accomplish this is selecting the threshold \(f_t\) for each i appropriately. Note that for most practical schemes (e.g. Kyber, Saber, New Hope), increasing the failure probability \(\beta _{0, f_t}\) over \(2^{-K}\) is not practically feasible or would require too much preprocessing \(\sqrt{\alpha _{0, f_t}^{-1}}\).

Figure 8 depicts the amount of work \(\sqrt{\alpha _{i, f_t}^{-1}}\beta _{i, f_t}^{-1}\) needed to increase the failure probability \(\beta _{i, f_t}\) to a certain failure probability (e.g. \(2^{-K}\)) for the parameters given in Table 1. The various curves correspond to different numbers of available failing ciphertexts. From this figure, one can see that also in this case, the work is dominated by finding the first decryption failure. Another observation is that the attack gets much more expensive as the maximal number of decryption queries \(2^K\) gets smaller.

Fig. 8.
figure 8

Quantum work \(\mathcal {W}^\mathrm{tot}_n\) required to find a new failing ciphertext, as a function of the decryption failure probability of a Mod-LWE scheme.

Table 5. Parameters of the ss-ntru-pke [10] scheme.

7.3 Application to ss-ntru-pke and Improvement of Guo et al. [27]

In [27], an adaptive multitarget attack is proposed on the ss-ntru-pke version of NTRUEncrypt [10], a Ring-LWE based encryption scheme that claims security against chosen ciphertext attacks. The parameters of this scheme are given in Table 5. The attack performs at most \(2^{64}\) queries on at most \(2^{64}\) targets and has a classical cost of \(2^{216}\) work, and a quantum cost of \(2^{140}\) when speeding up the offline phase with Grover’s search. We adapt directional failure boosting to this attack model and propose both a single and multitarget attack.

For the single target attack, our proposed methodology in Subsect. 6.3 needs more than \(2^{64}\) queries to obtain a ciphertext. To mitigate this, we increase the precomputational work \(\sqrt{\alpha ^{-1}}\) so that the failure probability of weak ciphertexts \(\beta \) increases over a certain \(f_t\), which is chosen as \(2^{-57}\) to make sure the total queries are below \(2^{64}\). The effect is a bigger overall computation, but a reduction in the number of necessary decryption queries. The rest of the attack proceeds as discussed in Subsect. 6.3. The work or queries needed to obtain an extra ciphertexts with n ciphertexts can be seen in Fig. 9a. The cost of this single target attack is \(2^{139.6}\), which is close to the cost of their multitarget attack \(2^{139.5}\), as can be seen in Table 6.

Table 6. Comparison of costs for different attacks against ss-ntru-pke [10].

In the multitarget case, we can use a maximum of \(2^{64} \cdot 2^{64}\) queries to find the first failing ciphertext, after which we use the methodology of the single target attack to obtain further ciphertext with limited amount of queries. In practice we stay well below the query limit to find the first failure. In this case, the work is dominated by finding the second decryption failure, as we need to do this in under \(2^{64}\) queries. The resulting work to obtain an extra ciphertext is depicted in Fig. 9b. The cost of this attack is \(2^{96.6}\), which is well below the cost of \(2^{139.5}\) reported by Guo et al.

Fig. 9.
figure 9

Quantum work \(\mathcal {W}^\mathrm{tot}_n\) and number of decryption queries \(\mathcal {Q}^\mathrm{tot}_n\) required to find a new failing ciphertext for ss-ntru-pke, given the ones found previously.