Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Side-channel attacks have drawn much attention since being proposed by Kocher et al. [1]. Up to now, many attack methods have been introduced to analyze side-channel information leaked by cryptographic devices, such as correlation power analysis [2, 3], template [4], collision [5, 6], mutual information [7] and fault attack. Differential Fault Analysis (DFA) [8] is one of the most well-known fault attacks. In DFA attack, the ciphertext with fault injected during executing is called a faulty output. The key is recovered from correct outputs and corresponding faulty outputs based on a fault model.

In CHES 2010, Li et al. [9] proposed the Fault Sensitivity Analysis (FSA) based on the fact that the critical paths of some Advanced Encryption Standard (AES) S-box combinational circuits are data dependent. However, a large number of encryptions is needed in an FSA attack. The adversary has to encrypt every plaintext for many times and shorten the glitch cycle gradually, in order to obtain the critical frequency (the fault sensitivity) at which faulty outputs begin to appear. In 2011, Moradi et al. extended FSA to masked AES implementation by combining it with collision attack [10]. Their attack is carried out at a fixed glitch frequency, but it still requires a lot of encryptions to extract the distribution of faulty ciphertexts. In addition, as correlation-based methods, both of these attacks need to enumerate the values of the plaintext, which increase the total number of encryptions. In 2012, Li et al. presented the Clockwise Collision Fault Sensitivity Analysis (CC-FSA) attack on unmasked AES [11]. They pointed out: in an iterative AES implementation, if inputs of two consecutive cycles are identical, the setup time of the second cycle is extremely short, because there are almost no toggles in the combinational circuit. Soon after that, Wang et al. proposed an improved clockwise collision attack called Fault Rate Analysis (FRA) and broke a masked serial AES S-box implementation [12]. The two methods are carried out at fixed glitch frequency, but they both suffer from the inefficiency of detecting clockwise collisions, and need a large number of encryptions.

Our Contribution. In this paper, we propose a new fault attack based on Transient-Steady Effect (TSE attack). Transient-steady effect denotes the phenomenon that the output of a gate turns to a temporal value and keeps steady for a while before it switches to the final steady state. We analyze the circuits of several AES S-box implementations and find out that the path of the key is usually much shorter than other signals. Therefore, soon after the rising edge of the clock, the output turns to a value that is computed from the key in current clock cycle and other data in the last cycle. By injecting a clock glitch, we can capture the temporal value as a faulty output to recover the key. We propose several fault models based on the transient-steady effect and verify TSE attack on both unmasked and masked AES S-boxes. Our attack has the following features:

  • In comparison to the existing works, TSE attack needs less encryptions in the attack stage. We only need to sweep the frequency of clock glitch for one time in the pre-computation stage. Then the attack stage can be conducted at a fixed frequency. Furthermore, the key can be recovered directly from faulty outputs, so we do not need a large amount of encryptions to build a statistical model.

  • TSE attack is verified to be effective to a masked implementation of AES based on tower field. Other masking techniques with obvious transient-steady effect may also be insecure under this attack.

  • TSE attack can break the protection strategy that changes the plaintext for every encryption, because correct outputs are not necessary.

Organization. We organize the rest part of this paper as follows. Related preliminaries are introduced in Sect. 2. The basic idea and attack scenarios are detailed in Sect. 3. We present experimental results and efficiency comparison in Sect. 4. Then we discuss about the application of our attack on parallel AES implementation and WDDL-AES in Sect. 5. Conclusions are given in Sect. 6.

2 Preliminaries

2.1 AES S-box and Masking

AES is a widely used symmetric cryptographic algorithm, which is composed of 10 rounds, and each round includes 16 S-boxes. When the area or the power consumption is limited, serial implementation of the algorithm is preferred. For example, a circuit with 4 S-boxes can accomplish one round in 4 cycles, and each S-box is reused for 4 times [13, 14]. Many low-power and low-area S-boxes have been proposed. For example, Morioka et al. gave a low-power approach [15], and Canright proposed a low-area approach based on tower-field [16].

Masking is a regular countermeasure against power analysis. Mask values randomize sensitive intermediate values and minimize the dependency between data and power consumption. S-box is the only nonlinear operation in AES algorithm, and many masking schemes have been proposed for it, such as the approach based on tower-field [17].

As shown in Fig. 1, a standard masked S-box has one masked output and three inputs: the masked value \(x_m\), the input mask m and the output mask w. We do not show the output mask as an output of S-box in Fig. 1, but it is also recorded for the next round.

Fig. 1.
figure 1

Unmasked S-box (left) and masked S-box (right)

2.2 Fault-Based Clockwise Collision Analysis

CC-FSA attack was presented by Li et al. in 2012 [11]. The attack is based on the fact that if the inputs of a circuit do not change in two consecutive clock cycles, there will be almost no toggles in the second cycle. It is called a clockwise collision, and the setup time of the second clock cycle will be extremely short. They let the target circuit work normally in the first cycle, and insert a clock glitch to create a very short second cycle. If the output is correct, a clockwise collision will be detected.

3 Transient-Steady Effect Attack

In most standard logic designs, the lengths of data paths in combinational circuits are usually different. If we focus on a gate, we can see that after the rising edge of the clock, the inputs of the gate do not necessarily arrive simultaneously. For example, we can assume the path delay of signal a is shorter than that of signal b. Hence, after the switch of a and before the arrival of b, the output turns to a transient illegal value. When the difference of the propagation delay between the two signals is large enough, the output stays at the illegal value for a while before all the propagations are done correctly. This is called the transient-steady effect. Related works have proved that transient-steady effect can lead to a data-dependent power consumption and leak the secret information indirectly [1822]. However, in this paper we show that the temporal value caused by transient-steady effect can be captured and used to retrieve secret information directly.

Based on the transient-steady effect, we propose the TSE attack: We let the target circuit compute normally in the first cycle, and inject a clock glitch to create a very short second cycle. The normal output of the first clock cycle is computed from the short-path data and the long-path data in the first cycle. The faulty output of the second cycle is computed from the short-path data in the second cycle and the long-path data in the first cycle. By combining the outputs of the two consecutive clock cycles, we can recover information of the short-path data.

3.1 Basic Idea

Without loss of generality, we first look at a combinational circuit which computes the output with two inputs, e.g. X and Y. Their propagation delays are denoted as \(t_X\) and \(t_Y\). The output, denoted as \(Z=f(X,Y)\), is captured by a register. As shown in Fig. 2, we assume the propagation delays of the two inputs are different, for example, \(t_Y\gg t_X\). Focusing on two specific clock cycles, we denote the inputs in the first cycle as \(X_1\) and \(Y_1\), and the inputs in the second cycle as \(X_2\) and \(Y_2\). After the rising edge of the second clock, the effects of \(X_2\) and \(Y_2\) begin to propagate along the two data paths, like two ripples with different speeds. After a period of time t (\(t_Y>t>t_X\)), \(X_2\) has impacted all the gates in the circuit, but the ripple of \(Y_2\) has not arrived at the output, so the output Z turns to a value of \(f(X_2,Y_1)\). We assume the difference of path delays, denoted as \(d=t_Y-t_X\), is large enough. Hence, the temporal value \(f(X_2,Y_1)\) keeps steady at the output for a while. As presented in Fig. 3, if a glitch is injected to make the length of the second cycle within the range from \(t_X\) to \(t_Y\), the temporal value can be stored in RegZ.

Fig. 2.
figure 2

An example of circuit with different propagation delays

Fig. 3.
figure 3

The sequence diagram with clock glitch

3.2 Attack Scenario on Unmasked S-box

First, we analyze the unmasked S-box. As mentioned in Sect. 2.1, we consider the serial implementation, where the inputs of different S-boxes are fed to the same combinational circuit in consecutive clock cycles. We assume the circuit executes the S-box operations of the final AES round consecutively. As in Fig. 4, there are two data paths in the circuit: The longer data path is marked with dashed red arrow, and its delay is denoted as \(t_y\). The shorter one is marked with solid green arrow, and its delay is denoted as \(t_k\).

Fig. 4.
figure 4

The data path of unmasked S-box in the final AES round (Color figure online)

As shown in Fig. 5, the output of first cycle \(z_1=\text{ S }(x_1)\oplus k_1\) is stored in register Reg1 at the rising edge of the second cycle. After the duration time of \(t_k\), \(k_2\) propagates through the exclusive-or gate and the output switches to a temporal value \(\tilde{z}_2=\text{ S }(x_1)\oplus k_2\). The temporal value stays for the duration time of \(t_y-t_k\). If we inject a clock glitch after the first clock cycle, and make sure the length of the glitch cycle satisfies \(t_y>t_g>t_k\), \(\tilde{z}_2\) can be stored in Reg2. With \(z_1\) and \(\tilde{z}_2\), we can compute

$$\begin{aligned} \begin{array}{rcl} z_1\oplus \tilde{z}_2 &{}=&{}\text{ S }(x_1)\oplus k_1\oplus \text{ S }(x_1)\oplus k_2\\ &{}=&{}k_1\oplus k_2\\ &{}=&{}\varDelta k_{1,2}\; . \end{array} \end{aligned}$$
(1)
Fig. 5.
figure 5

Sequence diagram of unmasked S-box with clock glitch

Since the data paths’ delays are unknown to us, we conduct the TSE attack practically in the following steps:

  • Step 1: Sweep the frequency of clock glitch, i.e. change the length of the clock glitch cycle gradually. At each frequency point, do encryptions with fixed \(x_1\) and random \(x_2\) for \(N_{pre}\) times and make a record for the faulty outputs.

  • Step 2: Find out the range of glitch frequency in which the faulty outputs keep stable. According to the analysis above, with fixed \(x_1\), the faulty output \(\tilde{z}_2=\text{ S }(x_1)\oplus k_2\) should be a constant value independent of \(x_2\).

  • Step 3: Choose a proper glitch frequency in the range detected in Step 2.

  • Step 4: Do encryptions for \(N_{attack}\) times at the chosen glitch frequency, record \(z_1\), \(\tilde{z}_2\), and compute the attack result \(z_1\oplus \tilde{z}_2\) for every encryption.

  • Step 5: Among all the attack results \(z_1\oplus \tilde{z}_2\), choose the value which has the highest occurrence rate as the value of \(\varDelta k_{1,2}\).

  • Step 6: Repeat Step 4 to 5 for other clock cycles to recover \(\varDelta k_{2,3}\), \(\varDelta k_{3,4}\), etc.

We call Step 1 to Step 3 as the pre-computation stage, which only needs to be done one time for a target circuit. Step 4 to 6, called the attack stage, can be done at a fixed frequency.

Note that there is no specific requirement on the unmasked S-box’s structure, as long as the shortest data path delay of the S-box is sufficiently long.

3.3 Attack Scenario on Masked S-box

Masked S-box has three inputs: the masked value \(x_m=x\oplus m\), the input mask m and the output mask w. The output is masked with w: \(y_w=y\oplus w=\text{ S }(x)\oplus w\). Since w is used to mask the output of S-box, its data path is usually shorter than \(x_m\) and m [12]. Here we focus on the masked S-box based on tower field [17]. As shown in Fig. 6, the data path of \(x_m\) and m, which is marked with dashed red arrow, is much longer than those of others. Similar to the unmasked S-boxes, the normal output of the first clock is captured in Reg1:

$$\begin{aligned} \begin{array}{rcl} z_1 &{}=&{}y_{w_1}\oplus k_1\oplus w_1\\ &{}=&{}\text{ S }(x_1)\oplus w_1\oplus k_1\oplus w_1\\ &{}=&{}\text{ S }(x_1)\oplus k_1\; . \end{array} \end{aligned}$$

Here \(y_{w_1}\) represents the masked S-box output of the first clock cycle. We inject a glitch after the first clock cycle. If the length of the glitch cycle is shorter than the delay of \(x_m\) and m, and longer than that of w and k, the temporal output \(\tilde{z}_2\) can be captured in Reg2:

$$\begin{aligned} \begin{array}{rcl} \tilde{z}_2 &{}=&{}\tilde{y}_{w_2}\oplus k_2\oplus w_2\\ &{}=&{}\text{ S }(x_1)\oplus w_2\oplus k_2\oplus w_2\\ &{}=&{}\text{ S }(x_1)\oplus k_2\;. \end{array} \end{aligned}$$

By combining \(z_1\) and \(\tilde{z}_2\), we have the result similar to unmasked S-box:

$$\begin{aligned} \begin{array}{rcl} z_1\oplus \tilde{z}_2 &{}=&{}\text{ S }(x_1)\oplus k_1\oplus \text{ S }(x_1)\oplus k_2\\ &{}=&{}k_1\oplus k_2\\ &{}=&{}\varDelta k_{1,2}\;. \end{array} \end{aligned}$$
(2)
Fig. 6.
figure 6

Data path of masked S-box in the final AES round (Color figure online)

Note that the attack described in this section is only applicable if the final unmasking is done within the same clock cycle as the final key addition.

4 Experiments and Efficiency

We verify the proposed TSE attack on two unmasked S-boxes [15, 16] and one masked S-box [17] which are implemented on DE2-115 FPGA board with Altera Cyclone IV EP4CE115. We use a RIGOL DG4102 function generator as the input clock. The circuit diagram of attack on masked S-box is shown in Fig. 7, and the setup for unmasked S-box is similar. A PLL is employed in the glitch generator to create clock for the control module and the circuit under attack. The PLL outputs two clock signals. The low frequency signal is used as the normal clock, and high frequency signal is used as the clock glitch. A clock multiplexer is used to switch between the normal and clock glitch. The outputs of two consecutive clock cycles, \(z_1\) and \(\tilde{z}_2\), are stored in the registers Reg1 and Reg2 respectively, and the attack result \(z_1\oplus \tilde{z}_2\) is stored in RAM. As presented in Sect. 3, if no fault is injected, the attack result should be \(\text{ S }(x_1)\oplus k_1\oplus \text{ S }(x_2)\oplus k_2\). If the attack succeeds, it should be \(k_1\oplus k_2\).

Fig. 7.
figure 7

Experimental circuit diagram of masked S-box

4.1 Experiment on Unmasked S-box A

The S-box we attack in this section is presented in [15], We set the key bytes as \(k_1=\text{0xE2 }\) and \(k_2=\text{0x19 }\). If the attack succeeds, the result stored in RAM should be \(\varDelta k_{1,2}=\text{0xFB }\).

Following the steps of TSE attack detailed in Sect. 3.2, we first do the pre-computation stage for S-box A: We choose 80 frequency points from 64 MHz to 480 MHz to sweep the glitch frequency. At every frequency point, the experiment is conducted as follows: We fix the value of \(x_1\) as 0x31, and enumerate the value of \(x_2\). For each \(x_2\), we encrypt it for 256 times. Therefore, at each frequency point, 65536 attack results are stored. As shown in Fig. 8, we count the occurrence rates for all the possible values of \(\varDelta k_{1,2}\) at every frequency point. Within the range from 360 MHz to 430 MHz, the occurrence rate of the correct value of \(\varDelta k_{1,2}\) rises up to nearly 100 %. Obviously, the range is suitable for the TSE attack.

Fig. 8.
figure 8

Results of swee** glitch frequency for S-box A

We also verify that the proper frequency range is valid for all the possible inputs of the S-box. The 256 occurrence rate curves of correct \(\varDelta k_{1,2}\) corresponding to all the 256 values of \(x_2\) are plotted in Fig. 9. Even though the critical timing delay of the S-box depends on the Hamming weight of the inputs [9], we can conclude from Fig. 9 that there is a proper frequency range, i.e. from 360 MHz to 430 MHz, for all the possible inputs.

Fig. 9.
figure 9

The 256 occurrence rate curves of correct \(\varDelta k_{1,2}\) corresponding to 256 values of \(x_2\) for S-box A.

At the attack stage, TSE attack can be done at any glitch frequency within the range from 360 MHz to 430 MHz. However, to illustrate the result more clearly, we conduct attacks on all the frequency points, and the success rate of attack is in Fig. 10. Increasing the number of encryptions used for each attack, i.e. \(N_{attack}\), can slightly widen the range of proper frequency. Even with only 1 encryption, our attack can achieve a success rate of nearly 100 %.

Fig. 10.
figure 10

Success rate vs. frequency with different \(N_{attack}\) for S-box A

4.2 Experiment on Unmasked S-box B

We carry out experiments on a very compact unmasked S-box [16] in the same way of Sect. 4.1. The results are shown in Figs. 1112, and 13.

Fig. 11.
figure 11

Results of swee** glitch frequency for S-box B

Fig. 12.
figure 12

The 256 occurrence rate curves of correct \(\varDelta k_{1,2}\) corresponding to 256 values of \(x_2\) for S-box B.

Fig. 13.
figure 13

Success rate vs. frequency with different \(N_{attack}\) for S-box B

As shown in Fig. 11, the range of proper glitch frequency is from 320 MHz to 580 MHz. From Fig. 12, we can see that inputs of the S-box have little effect on the frequency range.

As shown in Fig. 13, a small peak appears at about 176 MHz when 50 or more encryptions are used. The frequency of the peak is much lower than the range from 320 MHz to 580 MHz, so it may be easier to inject glitch at this frequency. However, the peak is very narrow, so the width of the glitch has to be very accurate to mount a success TSE attack. Moreover, as shown in Fig. 11, there are several peaks at 176 MHz corresponding to different attack results, for example, 0xFB (correct value of \(\varDelta k_{1,2}\)) and 0xF3. Since we always choose the value which has the largest occurrence rate as \(\varDelta k_{1,2}\), the probability of choosing 0xF3 is also very high. Hence, by injecting glitch at 176 MHz, we may not be able to recover \(\varDelta k_{1,2}\) directly, but the key space can still be reduced significantly.

4.3 Experiment on Masked S-box C

S-box C is the masked version of S-box B [17]. We set the inputs of S-box as \(x_1=\text{0x9D }\), \(x_2=\text{0xE6 }\), and the key bytes as \(k_1=\text{0x3F }\) and \(k_2=\text{0x58 }\). With no fault injected, the attack result should be \(\text{ S }(x_1)\oplus k_1\oplus \text{ S }(x_2) \oplus k_2=\text{0xB7 }\). If the attack succeeds, the result should be \(\varDelta k_{1,2}=\text{0x67 }\).

We choose 72 frequency points between 50 MHz to 200 MHz for frequency swee**. At each frequency point, we encrypt the plaintext with random masks m and w for 65536 times. As shown in Fig. 14, when the glitch frequency is lower than 75MHz, the occurrence rate of value 0xB7 is 100 %, namely no fault occurs. When the glitch frequency gets higher, there is only one peak higher than 60 %, which corresponds to the value of \(\varDelta k_{1,2}=\text{0x67 }\). The feasible frequency range for this S-box is from 145 MHz to 150 MHz.

Fig. 14.
figure 14

Results of swee** glitch frequency for S-box C

The results of the attack stage are shown in Fig. 15. With only one encryption for each attack, the success rate of TSE attack reaches to 90 % at the frequency of 150 MHz. With more encryptions, the range of proper glitch frequency is widened obviously. With more than 20 encryptions, our attack can have a success rate higher than 90 % within the frequency range from 142 MHz to 152 MHz.

It is worth noting that the proper glitch frequency for attacking the masked S-box is much lower than unmasked S-box. That is because, as a countermeasure against side-channel attacks, masking usually results in a longer data path delay for \(x_m\) and m, which turns out to be vulnerable to our attack.

Fig. 15.
figure 15

Success rate vs. frequency with different \(N_{attack}\) for S-box C

4.4 Efficiency Comparison

We compare TSE attack with related fault based attacks on AES S-box in Table 1. The comparison is based on the effort to disclose 8-bit information of the key. Our attack has obvious advantages in the memory space, the offline complexity and the number of encryptions needed to recover a key byte. Here \(C_{\rho _n}\) means the complexity of calculating the correlation coefficient of two n-sample vectors. Previous works [9, 10, 12] need many encryptions to obtain the statistical data or to build models of the target circuit’s behavior in each attack. However, our attack puts most workload into the pre-computation stage, i.e. sweeps the glitch frequency for only one time to find a proper frequency range. Then, in the attack stage, it is feasible and efficient to obtain key-related information from the faulty output directly.

Table 1. Comparison with three fault based attacks

The last row in Table 1 denotes the number of encryptions needed in pre-computation stage, the data in this row is estimated. Experienced attackers usually do not need so many encryptions.

5 Further Discussion

5.1 Key Recovery for Parallel AES Implementation

In some AES implementations, 16 S-boxes are implemented in parallel to achieve high throughput [23]. In such implementations, the transient-steady effect still exists. However, the temporal value turns out to be related with two adjacent rounds, rather than two S-boxes in one round. To apply our attack to parallel AES implementations, we focus on a standard structure of unmasked AES shown in Fig. 16. Here we use A to D to denote four 128-bit intermediate states in different stages, and use K to denote the 128-bit key. The index of rounds is denoted by the superscript, and the byte number is denoted by the subscript. For example, \(K^{10}_4\) means the 4th byte of the 10th round key.

Fig. 16.
figure 16

Standard structure of parallel AES implementation (Color figure online)

As in Fig. 16, at the beginning of the 10th round, there are two data paths: the delay of the key is shorter than that of the red dashed path. Consequently, if we shorten the 10th clock cycle to a proper length, we can capture the temporal value \(B^9\oplus K^{10}\) as the faulty ciphertext \(\tilde{D}^{10}\), before the intermediate value \(B^9\) is contaminated. Without fault, the output of the circuit is the correct ciphertext \(D^{10}=B^{10} \oplus K^{10}\). It is worth noting that both the correct output and faulty output are needed to attack a parallel implementation with TSE attack, which is different from the situation in serial implementation.

Fig. 17.
figure 17

Key recovery of parallel AES implementation

As shown in Fig. 17, we can deduce equations of \(K^9\) and \(K^{10}\) as follows once we have \(D^{10}\) and \(\tilde{D}^{10}\):

$$\begin{aligned} \begin{array}{c} \left\{ \begin{array}{l} {{\tilde{D}}^{10}} \oplus {K^{10}} = {B^9}\\ \mathrm{{MixCol}}\left( {{B^9}} \right) = {C^9}\\ {C^9} \oplus {K^9} = {D^9}\\ \mathrm{{Sbox}}\left( {{D^9}} \right) = {A^{10}} = {B^{10}}\\ {B^{10}} \oplus {K^{10}} = {D^{10}} \end{array} \right. \\ \Rightarrow \mathrm{{Sbox}}\left( {\mathrm{{MixCol}}\left( {{{\tilde{D}}^{10}} \oplus {K^{10}}} \right) \oplus {K^9}} \right) \oplus {K^{10}} = {D^{10}} \; . \end{array} \end{aligned}$$
(3)

According to AES key schedule, \(K^9\) can be expressed by \(K^{10}\), so \(K^{10}\) is the only variable in (3). Solving the equation system in (3) is similar to breaking one round AES by algebraic attack, which can be solved by MiniSAT tool.

However, TSE attack on parallel implementation is feasible only if the roundkey is precomputed and stored in registers. Otherwise, the data path of key schedule is comparable to that of S-box.

5.2 Attack Scenario for WDDL-AES

Wave Dynamic Differential Logic (WDDL) is a kind of dual-rail precharge logic. Every signal of WDDL has two complementary wires. Every clock cycle consists of two phases: in precharge phase, both of the wires are precharged to a fixed value, for example, (0, 0); in the evaluation phase, the values of two wires are either (1, 0) or (0, 1). WDDL is believed to be secure against setup violation faults [24]. Because the precharge phase inserts an all-zeros state in every clock, shortening a clock cycle will lead to an all-zeros faulty ciphertext. However, if the delays of different data paths have significant difference, the circuit may not be perfectly secure any more.

Considering a WDDL-AES implementation with the same structure in Fig. 4, we assume that after the rising edge of the clock, the all-zeros state propagates slow enough along the data path of y, so \(k_2\) arrives before the value of \(y_1\) is cleared. As shown in Fig. 18, by shortening the length of the clock, the attacker can store the temporal value and recover the key, which is similar to the analysis in Sect. 3.2.

Fig. 18.
figure 18

Sequence diagram of WDDL-AES with clock glitch. The all-zeros state is denoted by Z (high impedance) state.

TSE attack on WDDL-AES is more difficult than non-WDDL AES implementations, because the all-zeros state usually propagates faster than other states [25, 26]. However, it is noteworthy that unbalanced data path remains a potential vulnerability to our attack.

Fig. 19.
figure 19

Glitch injection experiment with reduced supply voltage

Fig. 20.
figure 20

Results of swee** glitch frequency for S-box A with reduced supply voltage

Fig. 21.
figure 21

Success rate vs. frequency with different \(N_{attack}\) for S-box A with reduced supply voltage

5.3 Glitch Injection

In this section, we discuss about the feasibility of injecting clock glitch externally. The clock glitch required in TSE attack is very short. For example, the width of clock glitch should be no more than 2.8 ns for unmasked S-box A. Such a short glitch may be filtered out when injected externally, even though it is reported in many literatures that the glitch width can be smaller than 3 ns [27, 28].

A straightforward way to bypass the obstacle is to do a semi-invasive attack: cut the clock line and connect it to a external glitch signal. Another option is to slow down the target circuit, so that TSE attack can be carried out with wider glitch.

Under some conditions, the attacker can increase the delay of the target circuits. For example, by reducing the supply voltage, the propagation delay can be increased [29]. We reduce the supply voltage of FPGA chip from 1.5 V to 1.08V and rerun the experiments in Sect. 4.1. As shown in Fig. 19, we cut the power supply of Cyclone IV EP4CE115 and connect it to a DC power supply. The attack results are shown in Figs. 20 and 21. By reducing the voltage to 1.08 V, the feasible glitch frequencies go down to the range from 125 MHz to 136 MHz, which is about 1/3 of the frequency range with normal voltage.

6 Conclusions

In this paper, we propose a new TSE attack based on the transient-steady effect. By injecting glitch in clock signal, the transient-steady value can be captured to recover the key of AES. We conduct experiments on two kinds of unmasked S-boxes and one kind of masked S-box, and all the S-boxes are implemented in serial with an FPGA board. Experimental results show that TSE attack can recover a key byte of the unmasked S-boxes with 1 encryption, and recover a key byte of the masked S-box with less than 20 encryptions. The attack scenarios on parallel AES implementation and WDDL-AES are also discussed.

The foundation of TSE attack is that the path of key is obviously shorter than other data, i.e. the inputs of S-box. Hence, against TSE attack, we recommend the architectures in which the key’s path is sufficiently long, for example, the roundkey is generated simultaneously with the encryption. Countermeasures such as inserting dummy operations into the key’s path are also feasible options, but the throughput may be impacted.