1 Introduction

Data security has become a mandatory requirement with ever increasing in the number of internet users for delivering data [1]. Numerous software-based encryption techniques such as DES [2] and RSA [3] can be employed in order to transmit the data and information in shared channels in a secure way. Optical methods are also used to encrypt images so that original images are not able to be retrieved without keys [4, 5]. The essential objective of the data hiding is to transfer the secret data safely from the transmitter to receiver. One of the methods which is used to ensure a safe data transmission is steganography [6]. Steganography is a technique that provides data and information to be transferred safely on a carrier such as video, audio, text and image [7, 8]. The word steganography is obtained using Greek words steganos and graphie, and it means concealed writing. Cryptography is the art of encrypting data and information and making them hard to understand. In cryptography, the secret message is in scrambled form or encrypted form which is not understandable but the existence of the secret message is visible to everyone unlike steganography. On the other hand, unlike cryptography, in steganography after data hiding, the secret information is not even visible to the eavesdropper or the intruder which causes this method safer and secure to follow [8].

Steganography technique includes components such as cover object, secret data, and stego object. Cover object is utilized as an environment to hide the data. Secret data are hidden as a message in the cover object. After hiding the secret data in the cover object, stego object is obtained. The type of steganography is named according to the medium used as a cover object. When the cover object is an image, it is named image steganography. In a similar way, the technique is named text steganography, video steganography, and sound steganography with respect to the type of media utilized as a cover object [6]. In the introduced study, a lung CT scan is utilized as a cover object. However, the proposed steganography method for voice hiding can be performed in different types of medical images.

Numerous studies based on information and data encryption have been carried out [9,10,25], discrete wavelet transform (DWT) [26], and discrete Fourier transform (DFT) [7] have been suggested. In these techniques, transformation processes are performed in order to hide the secure message in the cover object. On the other hand, in this study, a LSB-based method which is carried out in spatial domain has been proposed due to its being a simpler technique compared to the one performed in transform domain [27].

The motivation of carrying out this work is presented as follows. The encrypted voice comment of the doctor is hidden in a medical image. A novel encryption scheme is suggested in order to cipher the voice data. Differently from the previous studies [9,10,1, 6, 23] on steganography technique, differential attack has not been taken into consideration. However, in the proposed study, a novel algorithm which is resistant to differential attack has been proposed in order to be utilized in the steganography method.

The rest of the paper is organized as follows. Section 2 presents a novel steganography algorithm scheme including chaotic system to hide the doctor’s ciphered voice comment. Security analyses which are statistical attacks, differential attack and initial condition sensitivity are given to show the functioning of the suggested steganography algorithm in Sect. 3. The study is concluded in Sect. 4.

2 A new steganography algorithm scheme based on chaos to hide encrypted voice comment

An image steganography technique in order to hide the encrypted audio data has been suggested in this study. The proposed steganography technique is based on logistic chaotic map. When the chaotic system is realized on a digital computing device with finite precision, it is called digital chaotic system. The sequence obtained by the digital chaotic system becomes periodic due to finite precision device. This challenge leads the dynamical degradation of digital chaotic system [28]. In this study, a delay-introducing method-based logistic chaotic map presented in [29] has been utilized to counteract the effect of the dynamical degradation in the digitalization of the chaotic system. Logistic chaotic map which is realized on the device with finite precision can be presented as

$$ x_{i + 1} = {\text{FL}}\left( {ax_{i} \left( {1 - x_{i} } \right)} \right) $$
(1)

where FL represents the precision function and the control parameter \(a\) \(\in\) (3.5699, 4). To counteract the degradation effects, a linear function of delay state \(x_{i - 1}\) given in Eq. (2) is utilized in place of parameter \(a\).

$$ h(x_{i - 1} ) = bx_{i - 1} + 4 - b $$
(2)

where parameter \(b \in\) (0, 0.4) and function \(h(x_{i - 1} ) \in\) (3.5699, 4). Therefore, Logistic chaotic map utilizing delay-introducing method which is realized on a digital computing device with finite precision can be defined as

$$ x_{i + 1} = {\text{FL}}\left( {\left( {bx_{i - 1} + 4 - b} \right)x_{i} \left( {1 - x_{i} } \right)} \right) $$
(3)

where the initial values \(x_{0}\) = 0.1 and \(x_{1}\) = 0.2. In the algorithm of steganography method, the secret key sequences with the values between 0 and 255 are required since the density of a pixel is between 0 and 255. In order to obtain the values of \(x_{i + 1}\) between 0 and 255, the following equation is used.

$$ x_{i + 1} = {\text{floor}}\left( {{\text{mod}}\left( {x_{i + 1} \times 10^{5} ,256} \right)} \right) $$
(4)

where mod represents modulo operation and floor rounds the element to the nearest integer less than or equal to that element.

Steganography scheme comprises of converting audio data to pixel value, random pixel placement, logical XOR, XOR for sequential bits, complement and swap, XOR operation with next pixel and encrypted audio data hiding into cover image based on LSB (least significant bit) method. To improve security of the proposed steganography method, audio data to be hidden have been encrypted. In an algorithm which is robust against differential attack, a minor change in one bit of any pixel in the plain image should completely change the encrypted image [24]. In the introduced algorithm, XOR operation for sequential bits transfers the value of any bit in the pixel to the other bits of the pixel. This operation is useful for the bits of a pixel. On the other hand, XOR operation with next pixel transfers the value of any pixel in the image to the other pixels of the image. In brief, XOR operation for sequential bits has an impact on bits, while XOR operation with next pixel has an impact on pixels to obtain a robust algorithm. Therefore, these two algorithms convey any slight change in pixel to the other pixels and they are necessary for generating an algorithm resistant to differential attack. It is proved in the analysis part that the encrypted data can be resistant against differential attack. In this paper, a lung angiography dual-energy CT image in [30] is utilized as a cover object. In addition, the bit depth of audio file to be hidden is chosen as 8 bits in this study. In practical implementations, the length of the voice record depends on the size of the cover image and the quality of voice record. When we increase the size of the cover image or decrease the quality of the voice record, the length of the voice data to be hidden in the medical image can be extended. For example, more than 1-h audio record can be hidden utilizing a cover image of size 2048 × 2048 pixels and down sampling the audio record by 8.

  1. (i)

    Converting audio data to pixel value

    1. (1)

      Assume that a 8-bit audio file is named A and its size is N and each audio sample value in A(i) ranges from − 128 to 127, i = 1,2,…,N. A(i) is converted to B(i) which ranges from 0 to 255. B(i) can be shown as follows

      $$ B\left( i \right) = s_{7} \times 2^{7} + s_{6} \times 2^{6} + s_{5} \times 2^{5} + s_{4} \times 2^{4} + s_{3} \times 2^{3} + s_{2} \times 2^{2} + s_{1} \times 2^{1} + s_{0} \times 2^{0} $$
      (5)

      Obviously, si, i = 0,1,2,…,7 belongs to {0,1}.

    2. (2)

      In an image, a pixel value is demonstrated using 8 bits. Using Eq. (5), each element in B(i) presented as an integer value is transformed into 8-bit binary value named C(N,8). C(i) is given as “s7s6s5s4s3s2s1s0”. C(p) ranges from 0 to 255, p = 1,2,…,N. Each sample of audio data can be presented as one pixel by converting the 8-bit audio data value into a pixel value.

  2. (ii)

    pseudorandom pixel placement

    In the literature, random number generators (RNGs) have been utilized in numerous applications [31,32,33,34]. There are two kinds of RNGs which are true random number generators (TRNGs) and pseudo random number generators (PRNGs). The TRNGs are utilized to produce random numbers with the help of physical processes which are jitter and thermal noise. Nevertheless, the TRNGs are not able to be used in encryption and decryption processes since two exactly same secret key sequences can not be obtained in these processes, respectively. On the other hand, in the PRNGs, a sequence of unpredictable values can be generated due to its deterministic behavior [31]. This type of random generator is called pseudo since the same unpredictable sequence is produced under the same condition in encryption and decryption processes, respectively. In other word, the pseudorandom pixel placement algorithm makes pixel placement unpredictable, not random.

    In the introduced algorithm, a blank image whose all pixel values are zero is generated in order to hide the voice comments of the doctor into a cover medical image. Each value of C(p) is placed in this blank image, not sequentially, utilizing a pseudorandom coordinates array produced by the chaotic system. For a cover image of \(m\) × \(m\) pixels, a pseudorandom pixel coordinates array is generated using Algorithm 1. Using coordinates array in terms of row and column produced by Algorithm 1, a pseudorandom placement of voice comments into the blank image has been carried out. X, Y, Z represent the digital values produced by the variable \(x\) of logistic chaotic map given in Eq. (4). Parameter \(b\) given in Eq. (3) is taken as 0.1, 0.2, 0.3, respectively, to obtain the values of X, Y, Z.

    Algorithm 1: Pseudorandom pixel placement

    s = 0;

    XY = xor(X,Y);

    YZ = xor(Y,Z);

    while (coordinates_array < N) do

    s = s + 1;

    row = X(s).*Y(s).*XY(s);

    row = mod(row,m) + 1;

    column = Y(s).*Z(s).*YZ(s);

    column = mod(column,m) + 1;

    coordinate = [row,column];

    coordinates_array = unique([coordinates_array; coordinate]);

    end while

  3. (iii)

    Logical XOR operation

    Logical XOR operation is carried out utilizing Algorithm 2. The key of R(p) is produced using a delay-introducing method-based logistic chaotic map.

    Algorithm 2: Logical XOR operation

    XY = xor(X,Y);

    YZ = xor(Y,Z);

    XZ = xor(X,Z);

    R = [XY;YZ;XZ;X;Y;Z];

    C = C ⊕ R;

  4. (iv)

    XOR for sequential bits

    XOR operation is performed for sequential bits of each C(p) using Algorithm 3. Sequential XOR operation is carried out from LSB to MSB. For example, assume that C(p) is [10010110]. After performing Algorithm 3, new C(p) becomes [01110010].

    Algorithm 3: XOR for sequential bits

    C(p,7) = xor(C(p,7), C(p,8));

    C(p,6) = xor(C(p,6), C(p,7));

    C(p,5) = xor(C(p,5), C(p,6));

    C(p,4) = xor(C(p,4), C(p,5));

    C(p,3) = xor(C(p,3), C(p,4));

    C(p,2) = xor(C(p,2), C(p,3));

    C(p,1) = xor(C(p,1), C(p,2));

  5. (v)

    Complement and swap

    The bits of each C(p) which are equal to one are counted. One’s complement operation is done when the number of bits which are one is odd, otherwise four bits swap** operation is performed. The complement and swap operations are presented in Algorithm 4. All zeroes replace with ones and all ones replace with zeroes to carry out one’s complement operation. From MSB to LSB, the first 4 bits are swapped with the last 4 bits to perform four bits swap** operation. For instance, suppose that C(p) is [10010110]. After applying Algorithm 3 and Algorithm 4, new C(p) becomes [01110010] and [00100111], respectively. In another example, assume that only LSB of C(p) is altered and it becomes 1 instead of 0 as compared with previous example and C(p) is [10010111]. After applying Algorithm 3 and Algorithm 4, new C(p) becomes [10001101] and [11011000], respectively. In these two examples, decimal values of the new C(p) after applying Algorithm 3 and Algorithm 4 become 39 and 216, respectively. When the new C(p) values are compared with each other in two examples, it is clear that only one bit change in C(p) leads huge alteration in the new value of C(p) after carrying out XOR for sequential bits operation and complement and swap operations. A minor change in a pixel of the image including doctor’s voice comments as pixel values causes a huge change in the encrypted medical cover image. It means that the suggested method can resist differential attack.

    Algorithm 4: Complement and swap

    Count the bits equal to one

    for i = 1:8 do

    if C(p,i) =  = 1 do

    counter = counter + 1;

    end if

    end for

    If the number of bits equal to one is odd, perform complement otherwise carry out four bits swap** operation

    m = mod(counter,2);

    if m =  = 1 do

    C(p,1:8) = complement(C(p,1:8));

    else do

    C_temp(p,1:8) = C(p,1:8);

    C(p,1:4) = C_temp(p,5:8);

    C(p,5:8) = C_temp(p,1:4);

    end if

  6. (vi)

    XOR operation with next pixel

    The effect of a pixel should be delivered to all pixels to obtain an image steganography scheme resistant against differential attack. Each binary pixel, C(p), is XORed with next pixel, C(p + 1), in order that the encryption scheme can resist differential attack. The result of XOR operation becomes the new value of next pixel. Moreover, before performing this operation, each pixel is also XORed with the key values, S(p), obtained from chaotic system to increase the complexity of the steganography scheme. XOR operation with next pixel is carried out using Algorithm 5.

    Algorithm 5: XOR operation with next pixel.

    XY = xor(X,Y);

    YZ = xor(Y,Z);

    XZ = xor(X,Z);

    S = [X;Y;Z;XY;YZ;XZ];

    C(p) = C(p) ⊕ S(p);

    C(p + 1) = C(p) ⊕ C(p + 1);

    The algorithm of XOR operation with next pixel is carried out from C(1) to C(N) and then from C(N) to C(1) to make sure that any slight alteration in a pixel can have an effect on the all pixels in the image.

  7. (vii)

    Data hiding based on LSB method

    In this part of steganography scheme, encrypted voice comments of the doctor are embedded in cover medical image. The audio data to be hidden which are converted to an encrypted image are embedded to the R, G, B components of the cover image. Each pixel of the encrypted image including audio data is embedded to the related pixel in the cover medical image. From MSB to LSB, the first 3 bits, the next 3 bits and the last 2 bits of C(p) are embedded to the last 3 bits of R, the last 3 bits of G and the last 2 bits of B components of the cover image. Algorithm 6 presents data hiding based on LSB method. Cover medical image is given as D(N,8,3). Each R, G, B components of D(p) is defined between 0 and 255, p = 1,2,…,N.

    Algorithm 6: Data hiding based on LSB method

    for p = 1:N do

    Embedding encrypted audio data in R component of cover medical image

    D(p,6:8,1) = C(p,1:3);

    Embedding encrypted audio data in G component of cover medical image

    D(p,6:8,2) = C(p,4:6);

    Embedding encrypted audio data in B component of cover medical image

    D(p,7:8,3) = C(p,7:8);

    end for

Figure 1 presents a new chaotic system-based steganography algorithm scheme to provide the hiding of the voice comments of the doctor. The steps of steganography scheme are presented below. These ten steps present the encryption method and voice data hiding method. In the decoding phase, the exact reverse processes of these 10 steps are performed in order to uncover the voice data belonging to the doctor.

Fig. 1
figure 1

A chaos-based steganography scheme

Step 1: Input a 8-bit audio data including the comments of the doctor.

Step 2: Convert the audio data to pixel value.

Step 3: Perform pseudorandom pixel placement in a blank image with the size of \(m\) × \(m\) pixels using Algorithm 1.

Step 4: A key which is called R(p) is generated from chaotic system. Perform logical XOR operation presented in Algorithm 2.

Step 5: XOR for sequential bits, complement and swap and XOR operation with next pixel operations are performed one after another. Encrypted audio data have been obtained and the presented steganography method is able to be robust to differential attack using Algorithm 3, Algorithm 4, and Algorithm 5.

Step 6: Input a RGB cover medical image of \(m\) × \(m\) pixels.

Step 7: The cover image is split into R, G, B components.

Step 8: Hide the audio data in cover medical image using Algorithm 6.

Step 9: Obtain R, G, B components of stego medical image.

Step 10: Recover RGB stego medical image which includes the encrypted voice comments of the doctor.

Figure 2a, b presents image histograms for R, G, B components of cover image and stego image, respectively. Figure 2c, d shows the time series and histogram for secret audio data and obtained audio data, respectively. It is obvious from Fig. 2a, b, both histograms are relatively similar and audio data to be hidden do not change the cover image dramatically. In addition, Fig. 2c, d proves that the secret audio data can be successfully obtained from stego image. Figure 3 demonstrates the correlations in the horizon, vertical and diagonal directions for stego image and cover image. When Fig. 3a, b is compared, it can be clearly seen that the audio data make only minor alteration on stego image.

Fig. 2
figure 2

The image histograms for R, G, B components a cover image, b stego image. The time series and histogram for c secret audio data, d obtained audio data

Fig. 3
figure 3

The correlation coefficients in the diagonal, horizon and vertical directions a cover image, b stego image

In image processing theory, there are numerous kinds of quality measurement parameters to determine the similarity between the original image and modified image. Root-mean-squared error (RMSE), mean squared error (MSE), peak-signal-to-noise ratio (PSNR), mean absolute error (MAE), and Structural Similarity Index Metric (SSIM) are utilized as quality measurement parameters in this study [35]. SSIM is given as:

$$ {\text{SSIM}} \left( {{\text{CI}},{\text{ SI}}} \right) = \frac{{\left( {2\mu_{{{\text{CI}}}} \mu_{{{\text{SI}}}} + c_{1} } \right)\left( {2\sigma_{{{\text{CISI}}}} + c_{2} } \right)}}{{\left( {\mu_{{{\text{CI}}}}^{2} + \mu_{{{\text{SI}}}}^{2} + c_{1} } \right)(\sigma_{{{\text{CI}}}}^{2} + \sigma_{{{\text{SI}}}}^{2} + c_{2} )}} $$
(6)
$$ \mu_{{{\text{CI}}}} = \frac{1}{M \times N}\mathop \sum \limits_{i = 1}^{M \times N} {\text{CI}}_{i} $$
(7)
$$ \mu_{{{\text{SI}}}} = \frac{1}{M \times N}\mathop \sum \limits_{i = 1}^{MxN} {\text{SI}}_{i} $$
(8)
$$ \sigma_{{{\text{CI}}}}^{2} = \frac{1}{M \times N - 1}\mathop \sum \limits_{i = 1}^{M \times N} \left( {{\text{CI}}_{i} - \mu_{{{\text{CI}}}} } \right)^{2} $$
(9)
$$ \sigma_{{{\text{SI}}}}^{2} = \frac{1}{M \times N - 1}\mathop \sum \limits_{i = 1}^{M \times N} \left( {{\text{SI}}_{i} - \mu_{{{\text{SI}}}} } \right)^{2} $$
(10)
$$ \sigma_{{{\text{CISI}}}} = \frac{1}{M \times N - 1}\mathop \sum \limits_{i = 1}^{M \times N} \left( {{\text{CI}}_{i} - \mu_{{{\text{CI}}}} } \right)\left( {{\text{SI}}_{i} - \mu_{{{\text{SI}}}} } \right) $$
(11)

where c1 and c2 are constants. \(\sigma_{{{\text{CISI}}}}\), \(\sigma_{{{\text{SI}}}}^{2}\), \(\sigma_{{{\text{CI}}}}^{2}\), \(\mu_{{{\text{SI}}}}\), \(\mu_{{{\text{CI}}}}\), \({\text{SI}}\) and \({\text{CI}}\) represent the covariance of cover and stego images, the variance of stego image, the variance of cover image, the average of stego image, the average of cover image, stego image and cover image. M and N are the dimensions of the image. MSE, RMSE, MAE, and PSNR are given, respectively, in Eqs. (12)–(15).

$$ {\text{MSE}} = \frac{1}{M \times N}\mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \left( {{\text{CI}}\left( {i,j} \right) - {\text{SI}}\left( {i,j} \right)} \right)^{2} $$
(12)
$$ {\text{RMSE}} = \sqrt {{\text{MSE}}} $$
(13)
$$ {\text{MAE}} = \frac{1}{M \times N}\mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \left| {{\text{CI}}\left( {i,j} \right) - {\text{SI}}\left( {i,j} \right)} \right| $$
(14)
$$ {\text{PSNR}} = 10\log_{10} \frac{{\left( {2^{8} - 1} \right)^{2} }}{{\sqrt {{\text{MSE}}} }} $$
(15)

Table 1 gives similarity metrics between cover image including no audio data and stego image including audio data for 512 × 512 pixel and 1024 × 1024 pixel images. When cover image is equal to stego image, the values of SSIM, MSE, RMSE, MAE, and PSNR are obtained as 1, 0, 0, 0 and ∞, respectively. The similarity metrics are determined for R, G, B components of stego image and cover image. Table 1 shows that stego image with doctor’s voice comments is relatively similar to plain cover image. As expected, the values of similarity metrics are obtained as closer to the ideal values for blue component compared to red and green components. Because, 2 bits are hidden in blue component, while 3 bits are hidden both red and green components. Moreover, using a smaller cover image in steganography method increases the similarity between cover and stego images.

Table 1 Similarity metrics between cover image and stego image in terms of R, G, B components

3 Security analyses

Any introduced steganography method should have the ability to exhibit good performance and a good encryption technique should defense the commonly known security risks. Various security analyses such as differential attack, statistical attacks and initial condition sensitivity must be carried out in order to present the effectiveness and robustness of the suggested steganography scheme.

3.1 Differential attack analysis

One of the security attacks which is commonly used is called differential attack. An encryption scheme with diffusion property displays high performance of being resistant against differential attack [36]. By making a minor alteration on the plain image such as changing the value of one pixel, the attackers try to determine significant relationships among the encrypted image and the plain image. When a minor modification in the plain image leads a huge difference in the encrypted image, the encrypted image is considered as strong against differential attack. Two well-known measures which are number of pixels change rate (NPCR) and unified average changing intensity (UACI) are carried out to evaluate the effect of only one pixel changing in the plain image over the encrypted image [22, 24, 36,37,38]. These measures are defined as:

$$ {\text{NPCR}} = \frac{1}{W \times H}\mathop \sum \limits_{i,j} D\left( {i,j} \right) \times 100\% $$
(16)
$$ {\text{UACI}} = \frac{1}{W \times H \times 255}\mathop \sum \limits_{i,j} \left| {C_{1} \left( {i,j} \right) - C_{2} \left( {i,j} \right)} \right| \times 100\% $$
(17)

Where \(C_{1}\) and \(C_{2}\) indicate two encrypted images for two plain images which are different by one bit only, \(H\) and \(W\) show the height and width of the image. \(D\) is an array consisting of 0 and 1. If \(C_{1} \left( {i,j} \right)\) = \(C_{2} \left( {i,j} \right)\), then \(D\left( {i,j} \right)\) is equal to 0, otherwise \(D\left( {i,j} \right)\) is equal to 1. The number of different pixels is given by NPCR. The average intensity changes between two images is determined by UACI. When UACI and NPCR values are large enough, the introduced steganography method is resistant against differential attacks [24, 39]. The value of any sample in audio data is changed with a difference of 1 to perform differential attack analysis. A one-second audio file whose sample rate is 8 kHz is used as a secret data in this study. The value of audio sample at position (5649) is increased by 1 for numerical analysis to determine the values of UACI and NPCR. To increase the security of the suggested steganography scheme, Algorithms 3, 4, 5 are utilized. By using these three algorithms, the proposed steganography method can resist differential attack. Table 2 presents the effect of Algorithms 3, 4, 5 on resisting differential attack. As can be seen in Table 2, after using the suggested steganography algorithm, the values of UACI and NPCR become large enough. However, NPCR and UACI values are quite small when Algorithms 3, 4, 5 are not utilized in steganography scheme.

Table 2 The effect of Algorithms 3, 4, 5 on resisting differential attack

The introduced steganography scheme is tested against differential attack. Ten 8-bit samples of the doctor’s voice comments are randomly selected and the values of these samples are changed with a difference of 1. The results of UACI and NPCR for ten samples are presented in Table 3. The average values of ten samples for UACI and NPCR are 33.5688% and 99.8069%, respectively. Table 4 presents a comparative study of the introduced algorithm to previous algorithms in terms of the values of UACI, NPCR, and information entropy.

Table 3 Results of NPCR and UACI tests for ten samples
Table 4 Comparative study of NPCR, UACI and information entropy of the introduced algorithm to previous algorithms

3.2 Statistical attack analyses

Statistical attack analyses such as correlations of two neighboring pixels, histogram, and information entropy are performed in this part of work.

3.2.1 Histogram analysis

The distribution of intensity levels belonging to each pixel of the image is shown by the histogram plot. In other words, the histogram demonstrates the values of pixel distribution. For an ideal cryptosystem, the histogram for the encrypted image must be distributed uniformly and is supposed to be flat to avoid statistical attacks. Figure 4 presents the histogram of encrypted audio data as an image including the comments of the doctor. Figure 4 shows that the pixels belonging to the encrypted audio data as an image are distributed uniformly and the encrypted image is not able to offer any significant information with regard to the plain image. Thus, steganography algorithm scheme introduced in this paper shows good confusion properties [24, 37, 38].

Fig. 4
figure 4

The histogram of the encrypted voice comments of doctor as an image

3.2.2 Correlation coefficient analysis of two neighboring pixels

An image including meaningful information may have high correlations among its neighboring pixels. Because of this, a powerful image steganography algorithm scheme should have the ability to break the correlations between neighboring pixels of the encrypted image and the correlation between two pixels should be nearly zero. If the correlation value belonging to the encrypted image is close to 1, then encrypted image is highly correlated and the encryption scheme fails to defense against statistical attack. The correlation analysis determining the similarity in plain and encrypted images has been carried out for the encrypted audio data as an image along vertical, horizontal and diagonal directions [24, 36,37,38]. In order to present correlation coefficient between two neighboring pixels, the following processes have been performed. Firstly, 2000 pairs of two neighboring pixels are randomly selected with the diagonal, vertical and horizontal directions from the encrypted image. In addition, the correlation coefficient value belonging to the encrypted image is calculated using the equations given below [24, 39].

$$ {\text{corr}}\left( {x,y} \right) = \frac{{{\text{cov}}\left( {x,y} \right)}}{{\sqrt {D\left( x \right)D\left( y \right)} }} $$
(18)
$$ {\text{cov}}\left( {x,y} \right) = \frac{1}{T}\mathop \sum \limits_{i = 1}^{T} \left[ {x_{i} - E\left( x \right)} \right]\left[ {y_{i} - E\left( y \right)} \right] $$
(19)
$$ D\left( x \right) = \frac{1}{T}\mathop \sum \limits_{i = 1}^{T} \left[ {x_{i} - E\left( x \right)} \right]^{2} $$
(20)
$$ E\left( x \right) = \frac{1}{T}\mathop \sum \limits_{i = 1}^{T} x_{i} $$
(21)

where x and y represent the values of two neighboring pixels and T indicates the total pairs of neighboring pixels. The values of correlation are given to find if there is a small correlation among two neighboring pixels in the encrypted image [24]. The correlations between two neighboring pixels of the encrypted image, cover image and stego image are presented in Table 5 for 512 × 512 pixel and 1024 × 1024 pixel images. The values of the correlations belonging to encrypted image along the horizontal, vertical and diagonal, directions are almost zero. The correlation distributions in the encrypted image along three directions are presented in Fig. 5. Both Fig. 5 and Table 5 prove that the proposed steganography algorithm using a chaotic system can be used to deliver the encrypted information safely. In addition, utilizing a bigger cover image decreases the correlation coefficients of the encrypted images.

Table 5 The correlation coefficient values between two neighboring pixels of the encrypted image, cover image and stego image
Fig. 5
figure 5

The correlation coefficient of encrypted voice comments of doctor as an image in the diagonal, horizontal and vertical directions

3.2.3 Information entropy analysis

Among numerous randomness test standards, the information entropy is used to show uncertainties of the image information. The pixels of a desired encrypted image should be distributed uniformly. The distribution of pixel intensity value in image can be measured by the information entropy. When the entropy of the image is higher, the uncertainty is bigger. It means that the decryption procedure for the image needs more information. On the contrary, the more orderly the encrypted image is, the smaller the information entropy is. The value of ideal entropy is equal to 8 [24, 36, 37, 39, 49]. H(m) which is the information entropy of m can be calculated as

$$ H\left( m \right) = - \mathop \sum \limits_{i = 1}^{L} P\left( {m_{i} } \right)\log_{2} P\left( {m_{i} } \right) $$
(22)

where L indicates grayscale level, P(mi) denotes the probability of the mith possible pixel. The entropy is measured in bits as log is base 2 logarithm. Using Eq. (22), the entropy is calculated as 7.9993 bits for the encrypted audio data. The value of information entropy indicates that the encrypted image shows the behavior of a random source and the proposed steganography algorithm is resistant to the statistical attacks. In other words, the probability of accidental data and information leakage of doctor comments is quite low [24, 36].

3.3 Initial condition sensitivity analysis

For a good steganography algorithm, it is vital to be sensitive to the initial condition belonging to the chaotic system. Initial condition sensitivity analysis is done to show the functioning of the introduced algorithm technique [24]. This analysis is performed utilizing one parameter in chaotic system with a slight difference. The precision is found as 10–16 for the chaotic system used in the steganography scheme. When the initial condition is altered, different sequence is obtained from the chaotic system. However, if the alteration in the value of initial condition becomes smaller than the stated precision, the sequence obtained from the chaotic system remains unchanged. This situation should be considered as the constraint of the steganography scheme.

In initial condition sensitivity analysis, the encryption stage is performed using original parameter \(b\). However, in the decryption stage, an increase of 10–16 in parameter \(b\) has been realized and all other parameters have remained the same to understand the effect of the value of the initial condition on encryption scheme. Figure 6 shows the initial condition sensitivity analysis. The obtained audio data from decrypted image with false parameter \(b\) is given in Fig. 6a. It is clear that the proposed steganography scheme is sensitive to initial condition. Thus, the introduced algorithm is resistant against exhaustive attack.

Fig. 6
figure 6

Initial condition sensitivity analysis selecting parameter \(b\) with an increase of 10–16

3.4 Known plaintext and chosen plaintext attacks

Plaintext attacks carried out by pirates include known plaintext and chosen plaintext attacks. The pirates are able to access the encryption scheme and can produce encrypted image from a selected plain image in chosen plaintext attack. On the other hand, in known plaintext attack, the pirates have encryption scheme and they can access encrypted and plain image contents which are randomly defined, not chosen by pirates. Chosen plaintext attack is the most intense and effective attack since the pirates are able to select encrypted and plain image contents [22, 38, 50]. It is stated in Sect. 3.3 that the introduced steganography technique is highly sensitive to initial condition. It means that a slight alteration in the value of initial condition enables a great change in the content of encrypted and decrypted images. In addition, encrypted image content is linked to not only existing bit of pixel or pixel of image value but also directly linked to next bit of pixel or pixel of image value thanks to enhanced XOR operations such as XOR operation for sequential bits and XOR operation with next pixel algorithms. Because of the reasons stated above, the introduced steganography method is able to be resistant against known plaintext and chosen plaintext attacks. Table 6 presents the comparison of this study with previous studies about chaos theory. In this table, this study and previous studies are compared in terms of performing steganography technique, employing cryptography technique, including security analysis and being resistant to differential attack.

Table 6 The comparison between this study and previous studies on chaos theory

4 Conclusion

In this work, a novel image steganography technique for the purpose of hiding the encrypted audio data which include the comments of the doctor has been proposed. In the steganography scheme, audio data are firstly converted to pixel values and these values are placed randomly in a blank image. Then, the image including doctor comments has been encrypted and the image with audio data has been embedded in a medical cover image. The histogram of the encrypted voice comments of doctor as an image is extremely uniform. Therefore, the ciphered image with sound data does not offer any meaningful information to the pirates. It can be seen from the results of the coefficient analysis that the values of correlation among two neighboring pixels in the ciphered images in three directions which are diagonal, vertical and horizontal are almost zero. It indicates that the steganography algorithm scheme can powerfully remove correlations among the neighboring pixels. A powerful steganography scheme should offer an information entropy which is close to 8 for an encrypted image. The information entropy is obtained as 7.9993 bits using the proposed steganography scheme. Analyses such as information entropy, correlation coefficient and histogram have proved that the introduced chaos-based algorithm scheme is able to be robust against statistical attacks. In addition, the average of the ten UACI values is obtained as 33.5688% and the average of the ten NPCR values is obtained as 99.8069% for a 512 × 512 pixel cover image. Taking into account these two values, it can be said that the suggested steganography scheme can resist differential attack. Moreover, initial condition sensitivity analysis has proved that the proposed algorithm in this paper is also robust against exhaustive attack. The suggested algorithm can also withstand known plaintext and chosen plaintext attacks.