1 Introduction

A new generalization of the exponential distribution as an alternative to the gamma, Weibull and generalized-exponential lifetime models has been introduced by Nadarajah and Haghighi (2011). The extension of the exponential distribution was named NH distribution (NHD) by Lemonte (2013) as an abbreviation for the name authors Nadarajah and Haghighi. Also, many properties of NHD are discussed by Nadarajah and Haghighi (2011). Suppose that the lifetime X of a testing unit follows two-parameter NHD(α, λ). The probability density function (PDF), f(⋅), cumulative distribution function (CDF), F(⋅), survival function (SF), S(⋅), hazard rate function (HRF), H(⋅), at mission time t, are given, respectively, by

$$ f(x;\alpha,\lambda)=\alpha\lambda(1+\lambda{x})^{\alpha-1}\exp({1-(1+\lambda{x})^{\alpha}}),\quad x>0,\ \alpha,\lambda>0, $$
(1.1)
$$ F(x;\alpha,\lambda)=1-\exp({1-(1+\lambda{x})^{\alpha}}),\quad x>0,\ \alpha,\lambda>0, $$
(1.2)
$$ S(t;\alpha,\lambda)=\exp({1-(1+\lambda{t})^{\alpha}}),\quad t>0,\ \alpha,\lambda>0, $$
(1.3)

and

$$ H(t;\alpha,\lambda)=\alpha\lambda(1+\lambda{t})^{\alpha-1},\quad t>0,\ \alpha,\lambda>0, $$
(1.4)

where α and λ are the shape and scale parameters, respectively. Putting α = 1 in Eq. 1.1, exponential distribution is introduced as a special case.

Recently, many studies on estimating the unknown parameters of NHD based on different life-testing experiments have been carried out by many authors, including: Singh et al. (2015a) obtained the maximum likelihood estimators (MLEs) and Bayes estimators (BEs) of the NHD under Type-II progressive censoring scheme (PCS). Singh et al. (2015b) discussed the MLEs and BEs of the unknown parameters and reliability characteristics of the NHD based on complete sampling. Dey et al. (2017) described different methods of estimation for the unknown parameters of NHD.

Progressively first-failure censored sampling (PFFCS) proposed by Wu and Kuş (2009). This censoring scheme have become quite popular in reliability and lifetime testing studies. However, the drawback of the first-failure censoring is that it cannot be to allow for sets to be removed from the test at the points other than the final termination point. For this reason, they suggested the PFFCS by allowing to remove some of survival sets from the life-test. They, defined a PFFCS as a combination between the concepts of first-failure censoring and Type-II PCS. Hence, an extend of Type-II PCS is referred to as PFFCS. The PFFCS can be describe as follows: Suppose that n independent groups with k items within each group are put on a life-testing experiment at time zero, m is a pre-fixed number of failures, and the PCS \({\mathbf {R}=(R_{1},R_{2},\dots ,R_{m})}\) is pre-fixed. At the time of first-failure observed (say X(1)), R1 groups and the group in which the first-failure are randomly removed from the remaining live groups nR1 − 1. Following, at the time of second-failure observed (say X(2)), R2 groups and the group in which the second-failure are randomly removed from the remaining live groups nR1R2 − 2, and so on. This procedure continues until all remaining live Rm, (mn), groups and the group in which the mth failure (say X(m)) has occurred are randomly removed at the time of the observed failure mth. Then \( X_{(1)},X_{(2)},\dots ,X_{(m)} \) represents the independent lifetimes PFFCS order statistics with pre-fixed progressive censoring R. If the failure times of the n × k items originally in the life-test are from a continuous population, with PDF f(x(i);Θ) and CDF F(x(i);Θ), then the likelihood function for \( X_{(i)},\ i=1,2,\dots ,m \), where Θ is parameter vector, is given by

$$ L({\Theta}|data) = Ck^{m}\prod\limits_{i=1}^{m}{f(x_{(i)};{\Theta})}[1-{F(x_{(i)};{\Theta})}]^{k(R_{i}+1)-1}, $$
(1.5)

where \({C}={n({n}-R_{1}-1)\dots ({n}-{\sum }_{i=1}^{m}(R_{i}+1))}\) and \({n = m+{\sum }_{i=1}^{m}R_{i}}\).

From Eq. 1.5, some sampling schemes can be obtained as a special cases, such as: Type-II PCS by putting k = 1, the joint PDF of the first-failure-censored order statistics by putting \({\mathbf {R}=(0,0,\dots ,0)}\), Type-II censoring by putting \({\mathbf {R}=(0,0,\dots ,n-m)}\) and k = 1, if putting \({\mathbf {R}=(0,0,\dots ,0)}\) and k = 1, then n = m, which yields the complete sample.

To our best knowledge, statistical inference for unknown parameters and reliability characteristics of NHD has not yet been studied under progressively first-failure censored data. In this paper, our main purpose is to use the maximum likelihood and Bayesian approaches to derive both point and interval estimates of the unknown parameters, as well as some lifetime parameters such as SF and HRF, of the NHD when the lifetime data are collected under PFFCS. It is observed that the MLEs of the unknown parameters cannot be obtained in closed form, as expected, and they have to obtained by solving two non-linear equations simultaneously. Independent conjugate gamma priors of the unknown parameters are considered to develop the BEs under squared error loss (SEL) function. Also, approximate confidence intervals (ACIs) and highest posterior density (HPD) credible intervals of the unknown parameters α and λ, and any function of them, are constructed. It is observed that the BEs have not been obtained in explicit form. Thus, we propose to use the Lindley’s approximation and Markov chain Monte Carlo (MCMC) method such as Metropolis-Hastings (M-H) algorithm to generate MCMC samples from the posterior distribution and then to compute BEs and also to construct the associated HPD credible intervals. A Monte Carlo simulation study is performed to compare the effects of group sizes k, number of groups n, effective sample sizes m, and different censoring schemes R to compare the performances among various estimates in terms of their mean square error (MSE) and relative absolute bias (RAB). Furthermore, a real data set is discussed based on different PCSs to illustrate our proposed estimators. To choose the optimum censoring scheme from a class of possible schemes, four commonly-used criteria were considered to compare between different schemes. The rest of the paper is organized as follows: In Section 2, we derive the MLEs and associated two-sided ACIs. In Section 3, we develop the Bayesian inference based on SEL function. Monte Carlo simulation results are presented in Section 4. In Section 5, optimal censoring plans are discussed. Section 6 deals with a real-life data set for illustration purposes. Finally, concluding remarks are provided in Section 7.

2 Maximum Likelihood Estimation

Let \( X_{(i)},\ i=1,2,\dots ,m \), (1 ≤ mn), be a progressive first-failure censored sample obtained from NHD(α, λ) with pre-fixed PCS R. Then, with substituting Eqs. 1.1 and 1.2 into 1.5, we get

$$ L(\alpha,\lambda|data) = Ck^{m}{(\alpha\lambda)}^{m} \exp{\left( k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)}\right)} \prod\limits_{i=1}^{m}{(\xi(x_{(i)};\lambda))^{\alpha-1}}, $$
(2.1)

where \(\xi (x_{(i)};\lambda ){}={}{(1{}+{}\lambda {x_{(i)}})}\) and \({\omega (x_{(i)},R_{i};\alpha ,\lambda ){}=(R_{i}+1)(1{}-{}(\xi (x_{(i)};\lambda ))^{\alpha })}\).

By drop** terms that do not involve α and λ, the corresponding log-likelihood function, (.), of Eq. 2.1 can be written with proportional, \( {\ell (.)}\propto {\log {L(.)}} \), as

$$ \ell(\alpha,\lambda|data) \propto m\log(\alpha\lambda)+{k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)}} +(\alpha-1)\sum\limits_{i=1}^{m}\log{(\xi(x_{(i)};\lambda))}. $$
(2.2)

Differentiating (2.2) partially with respect to α and λ, we get two likelihood equations must be simultaneously satisfied to obtain the MLEs \( \hat {\alpha } \) and \( \hat {\lambda } \) of α and λ, respectively, as

$$ \frac{\partial\ell}{\partial \alpha} = \frac{m}{\alpha}-k\sum\limits_{i=1}^{m}{(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha}\log(\xi(x_{(i)};\lambda))}+\sum\limits_{i=1}^{m}{\log(\xi(x_{(i)};\lambda))}, $$
(2.3)

and

$$ \frac{\partial\ell}{\partial \lambda} = \frac{m}{\lambda}-{\alpha}k\sum\limits_{i=1}^{m}{x_{(i)}(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha-1}}+(\alpha-1)\sum\limits_{i=1}^{m}{x_{(i)}(\xi(x_{(i)};\lambda))^{-1}}. $$
(2.4)

Clearly, from Eqs. 2.3 and 2.4, we have a system of two nonlinear equations in the unknown parameters α and λ. It is obvious that a closed form solution is quite difficult to obtain. So, we need to apply a suitable iterative procedure like the Newton-Raphson method to obtain the desired MLEs of the unknown two-parameter for given values of (k, n, m, R, x).

Also, using the invariance property of MLEs \( \hat {\alpha } \) and \( \hat {\lambda } \), the MLEs \( \hat {S}(t) \) and \( \hat {H}(t) \) of S(t) and H(t), as in Eqs. 1.3 and 1.4, respectively, for given mission time t can be derived by replacing α and λ by their MLEs \( \hat {\alpha } \) and \( \hat {\lambda } \), respectively. Asymptotic variance-covariance (V-C) matrix of the MLEs \( \hat {\Theta }=(\hat {\alpha },\hat {\lambda })^{\mathbf {T}} \) can be obtained by the inverting of the Fisher information matrix, I0(Θ), which is given by

$$ {I_{ij}({\Theta})}={{\ E}[{-{({\partial^{2}}{\ell({\Theta}|data)}/{\partial^{2}{\Theta}})}}]},\quad{i,j=1,2}. $$
(2.5)

Unfortunately, the exact expressions of the expectation (2.5) is difficult to get. By drop** the expectation operator E and replacing Θ by \( \hat {\Theta } \), see Cohen (1965), we get the approximate asymptotic V-C matrix for the MLEs as

$$ {\mathrm{I}_{0}^{-1}(\hat{\Theta})} \cong \left [ \begin{array}{ c c } -\frac{\partial^{2}\ell(\alpha,\lambda|data)}{\partial \alpha^{2}} \qquad -\frac{\partial^{2}\ell(\alpha,\lambda|data)} {{\partial \alpha}{\partial \lambda}}\\ -\frac{\partial^{2}\ell(\alpha,\lambda|data)} {{\partial \lambda}{\partial \alpha}}\qquad -\frac{\partial^{2}\ell(\alpha,\lambda|data)} {\partial \lambda^{2}} \end{array} \right ]_{(\alpha=\hat{\alpha},\lambda=\hat{\lambda})}^{-1} =\left [ \begin{array}{ c c } \hat{\sigma}^{2}_{\hat{\alpha}}\qquad \hat{\sigma}_{\hat{\alpha},\hat{\lambda}}\\ \hat{\sigma}_{\hat{\lambda},\hat{\alpha}}\qquad \hat{\sigma}^{2}_{\hat{\lambda}} \end{array} \right ]. $$
(2.6)

From (2.6), the Fisher’s elements will be

$$ \begin{array}{@{}rcl@{}} \frac{\partial^{2}\ell}{\partial \alpha^{2}} &=& -\frac{m}{\alpha^{2}}-k\sum\limits_{i=1}^{m}{(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha} (\log(\xi(x_{(i)};\lambda)))^{2}},\\ \frac{\partial^{2}\ell}{\partial \lambda^{2}} &=& -\frac{m}{\lambda^{2}}-(\alpha-1)\sum\limits_{i=1}^{m}{x_{(i)}^{2}(\xi(x_{(i)};\lambda))^{\alpha-2}}[\alpha{k}(R_{i}+1)+(\xi(x_{(i)};\lambda))^{-\alpha}],\\ \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\partial^{2}\ell} {{\partial \alpha}{\partial \lambda}} &=& \frac{\partial^{2}\ell} {{\partial \lambda}{\partial \alpha}} = -k\sum\limits_{i=1}^{m}{x_{(i)}(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha-1}} [\alpha\log(\xi(x_{(i)};\lambda))+1]\\ &&+\sum\limits_{i=1}^{m}{x_{(i)}(\xi(x_{(i)};\lambda))^{-1}}. \end{array} $$

The asymptotic normality of MLEs \({\hat {\Theta }}\sim {N({\Theta },\mathrm {I}_{0}^{-1}(\hat {\Theta }))} \) can be used to construct 100(1 − γ)% two-sided ACIs for the unknown parameters α and λ, respectively, as

$$ \begin{array}{@{}rcl@{}} {{\hat{\alpha}}\pm{z_{\gamma/2}}\sqrt{\hat{\sigma}^{2}_{\hat{\alpha}}}} \quad \text{and} \quad {{\hat{\lambda}}\pm{z_{\gamma/2}}\sqrt{\hat{\sigma}^{2}_{\hat{\lambda}}}} \quad \end{array} $$

where zγ/2 is the percentile of the standard normal distribution with right-tail probability (γ/2) − th. To construct the ACIs of S(t) and H(t), we need to find the variances of them. Therefore, the delta method is considered to obtain the approximate estimates of the variance of \( {\hat {S}(t)} \) and \( \hat {H}(t) \). Delta method is a general approach for computing ACIs for any function of the MLEs \( \hat {\alpha } \) and \( \hat {\lambda } \), see Greene (2012). Hence, according to this method, the variance of \( {\hat {S}(t)} \) and \( \hat {H}(t) \), can be approximated, respectively by

$$ \begin{array}{@{}rcl@{}} {\hat{\sigma}^{2}_{\hat{S}(t)}}={[\nabla{\hat{S}(t)}]}^{\mathbf{T}} {\mathrm{I}_{0}^{-1}(\hat{\Theta})}{[\nabla{\hat{S}(t)}]} \quad\text{and}\quad {\hat{\sigma}^{2}_{\hat{H}(t)}}={[\nabla{\hat{H}(t)}]}^{\mathbf{T}} {\mathrm{I}_{0}^{-1}(\hat{\Theta})}{[\nabla{\hat{H}(t)}]}, \end{array} $$

where \( \nabla {\hat {S}(t)} \) and \( \nabla {\hat {H}(t)} \) are, respectively, the gradient (vector of first partial derivatives) of S(t) and H(t) with respect to α and λ obtained at \( {\alpha =\hat {\alpha }} \) and \( {\lambda =\hat {\lambda }} \), i.e.,

$$ \begin{array}{@{}rcl@{}} {[\nabla{\hat{S}(t)}]}^{\mathbf{T}} =[\partial\nabla{\hat{S}(t)}/\partial{\alpha}, \partial\nabla{\hat{S}(t)}/\partial{\lambda}]_{({\alpha=\hat{\alpha}},{\lambda=\hat{\lambda}})} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} {[\nabla{\hat{H}(t)}]}^{\mathbf{T}} =[\partial\nabla{\hat{H}(t)}/\partial{\alpha}, \partial\nabla{\hat{H}(t)}/\partial{\lambda}]_{({\alpha=\hat{\alpha}},{\lambda=\hat{\lambda}})}. \end{array} $$

Hence, the 100(1 − γ)% two-sided ACIs of S(t) and H(t), are given, respectively, by

$$ \begin{array}{@{}rcl@{}} {\left( {\hat{S}(t)}\pm{z_{\gamma/2}}\sqrt{\hat{\sigma}^{2}_{\hat{S}(t)}}\right)} \quad \text{and} \quad {\left( {\hat{H}(t)}\pm{z_{\gamma/2}}\sqrt{\hat{\sigma}^{2}_{\hat{H}(t)}}\right)}, \end{array} $$

where zγ/2 is the percentile of the standard normal distribution with right-tail probability (γ/2) − th.

3 Bayesian Inference

In this section, the BEs and associated HPD credible intervals of the unknown model parameters α and λ, as well as, S(t) and H(t) under SEL function are developed. A very well-known symmetric loss function is the SEL function, which is defined as

$$ l_{s}{({\Theta},\tilde{\Theta})}=(\tilde{\Theta}-{\Theta})^{2}, $$
(3.1)

where \( \tilde {\Theta } \) being an estimate of Θ. From Eq. 3.1, the BE is given by the posterior mean of Θ. However, any other loss function can be easily incorporated.

In order to obtain the BEs, we assume that the both unknown parameters α and λ are independent random variables. Following Singh et al. (2015a, b) and Dey et al. (2017), we assumed that α and λ having conjugate gamma priors as \( \alpha \sim {Gamma(a_{1},b_{1})} \) and \( \lambda \sim {Gamma(a_{2},b_{2})} \), respectively. The hyper-parameters ai,bi, i = 1,2, are chosen to reflect prior know ledge about the two unknown parameters α and λ. Here, the gamma conjugate prior of the NH parameters are chosen because this prior is flexible enough to cover a large variety of prior beliefs of the experimenter. Consequently, the joint prior density of α and λ can be written with proportional as

$$ \pi{(\alpha,\lambda)}\propto {\alpha}^{a_{1}-1}{\lambda}^{b_{1}-1}e^{-(b_{1}\alpha+b_{2}\lambda)},\quad\alpha,\lambda>0, \ a_{i},b_{i}>0,\ i=1,2. $$
(3.2)

In the case of non-informative priors ai,bi = 0, i = 1,2, the joint prior density (3.2) becomes π(α, λ) ∝ (αλ)− 1. In continuous Bayes’ theorem, the joint posterior PDF of α and λ, up to proportionality, is given by

$$ \pi{(\alpha,\lambda|\underline{x})}\propto L(\alpha,\lambda|\underline{x})\pi{(\alpha,\lambda)}. $$
(3.3)

Combining (3.2) with (2.1), the joint posterior distribution (3.3) can be written as

$$ \begin{array}{ll} \pi{(\alpha,\lambda|\underline{x})} &\!\propto {\alpha}^{m+a_{1}-1}{\lambda}^{m+a_{2}-1} \exp{\left( - \left( b_{1}\alpha+b_{2}\lambda - k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)} \right) \right)}\\ &\times\prod\limits_{i=1}^{m}{(\xi(x_{(i)};\lambda))^{\alpha-1}}. \end{array} $$
(3.4)

The normalizing constant, κ, of Eq. 3.4 is given by

$$ \begin{array}{@{}rcl@{}} \kappa\! & = &\! {\int}_{0}^{\infty}{\int}_{0}^{\infty} {\alpha}^{m+a_{1}-1}{\lambda}^{m+a_{2}-1} \exp\! {\left( \! - \!\left( \! b_{1}\alpha\!+b_{2}\lambda\! -\! k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)\!} \right) \right)}\\ && \times\prod\limits_{i=1}^{m}{(\xi(x_{(i)};\lambda))^{\alpha-1}} {d\alpha}{d\lambda}. \end{array} $$

Hence, the BE of any function of α and λ, say φ(α, λ), under SEL function is the posterior expectation of φ(α, λ) is given by

$$ {\tilde\varphi(\alpha,\lambda)}= E(\varphi(\alpha,\lambda)|\underline{x}) =\frac{1}{\kappa} {\int}_{0}^{\infty}{\int}_{0}^{\infty} \varphi(\alpha,\lambda)\pi{(\alpha,\lambda|\underline{x})} {d\alpha}{d\lambda}. $$
(3.5)

Unfortunately, based on Eq. 3.5, the BEs are obtained in the form of a ratio of two-dimensional integrals for which a closed-form solution is not possible. Hence, the BEs and corresponding HPD credible intervals can be computed numerically. Because of that we propose to use two approaches to approximate (3.5), namely, Lindley’s approximation and importance sampling procedure.

3.1 Lindley’s Approximation

Lindley (1980) presented procedure to evaluate the posterior expectation, this procedure used when the posterior distribution take a ratio form that involves an integration in the denominator and cannot be reduced to a closed form. Thus, we use Lindley’s method to approximate all of the BEs. The posterior expectation of φ(𝜃) is given by

$$ \begin{array}{@{}rcl@{}} E(\varphi(\theta)|\underline{x}) =\frac{{\int}_{\theta}\varphi(\theta)e^{\varphi(\theta)}d\theta} {{\int}_{\theta}e^{\varphi(\theta)}d\theta}, \end{array} $$

where φ(𝜃) = (𝜃) + ρ(𝜃) is function of 𝜃 only, (𝜃) is the log-likelihood function and \( \rho (\theta )=\log \pi (\theta ) \) is the log-prior density function. For the two-parameters case φ(𝜃1,𝜃2), with sufficiently large sample size n, the approximate BE, \( \tilde \varphi _{L}(\theta _{1},\theta _{2}) \), of any function of 𝜃1 and 𝜃2 can be written as

$$ \begin{array}{ll} {\tilde\varphi_{L}(\theta_{1},\theta_{2})}= {\varphi(\hat\theta_{1},\hat\theta_{2})}+ &0.5(A+\ell_{30}B_{12}+\ell_{03}B_{21}+\ell_{21}C_{12}+\ell_{12}C_{21})\\ &+\rho_{1}A_{12}+\rho_{2}A_{21} \end{array} $$
(3.6)

where \( A={\sum }_{i=1}^{2}{\sum }_{j=1}^{2}u_{ij}\sigma _{ij} \), Aij = uiσii + ujσji, Bij = (uiσii + ujσij)σii, \( C_{ij}=3u_{i}\sigma _{ii}\sigma _{ij} +u_{j}(\sigma _{ii}\sigma _{jj}+2\sigma _{ij}^{2}) \), ui = φ(𝜃1,𝜃2)/𝜃i, uij = 2φ(𝜃1,𝜃2)/𝜃i 𝜃j, \( \ell _{ij}=\partial ^{i+j}\ell (\theta _{1},\theta _{2})/{\partial \theta _{1}^{i}}{\partial \theta _{2}^{j}} \), i, j = 0,1,2,3, i + j = 3, \( \rho =\log \pi (\theta _{1},\theta _{2}) \), ρi = ρ/𝜃i, and σij is the (i, j) − th element of the asymptotic V-C matrix \( \mathrm {I}_{0}^{-1}(\hat \theta _{1},\hat \theta _{2}) \). All terms in (17) are evaluated at the MLEs \( \hat \theta _{1} \) and \( \hat \theta _{2} \) of the unknown parameters 𝜃1 and 𝜃2, respectively.

Taking φ(α, λ) is a function of the unknown parameters α and λ or them such as S(t) and H(t), then the approximate BE \( {\tilde \alpha _{L}} \) and \( {\tilde \lambda _{L}} \) of α and λ, under the SEL function, are given, respectively, by

$$ \begin{array}{@{}rcl@{}} \begin{array}{ll} \tilde\alpha_{L} &=\hat{\alpha}+\hat\rho_{1}\hat\sigma_{11}+\hat\rho_{2}\hat\sigma_{12}+ 0.5(\hat\sigma_{11}(\hat\ell_{30}\hat\sigma_{11}+2\hat\ell_{21}\hat\sigma_{12}+\hat\ell_{12}\hat\sigma_{22})\\ &+\hat\sigma_{21}(\hat\ell_{21}\hat\sigma_{11}+2\hat\ell_{12}\hat\sigma_{21}+\hat\ell_{03}\hat\sigma_{22}) ), \end{array} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} \begin{array}{ll} \tilde\lambda_{L} &=\hat{\lambda}+\hat\rho_{1}\hat\sigma_{21}+\hat\rho_{2}\hat\sigma_{22}+ 0.5(\hat\sigma_{12}(\hat\ell_{30}\hat\sigma_{11}+2\hat\ell_{21}\hat\sigma_{12}+\hat\ell_{12}\hat\sigma_{22})\\ &+\hat\sigma_{22}(\hat\ell_{21}\hat\sigma_{11}+2\hat\ell_{12}\hat\sigma_{21}+\hat\ell_{03}\hat\sigma_{22}) ). \end{array} \end{array} $$

Now, to compute the BEs based on Lindley’s approximation, we must have

$$ \begin{array}{@{}rcl@{}} \ell_{30}&=&\frac{2m}{\alpha^{3}} -k\sum\limits_{i=1}^{m}{(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha}}(\log(\xi(x_{(i)};\lambda)))^{3},\\ \ell_{03}&=&\frac{2m}{\lambda^{3}}-(\alpha-1) \sum\limits_{i=1}^{m}{x_{(i)}^{3}(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha-3}}\\&& \times[\alpha(\alpha-2)k(R_{i}+1)-2(\xi(x_{(i)};\lambda))^{-\alpha}],\\ \ell_{21}&=& -k\sum\limits_{i=1}^{m}{x_{(i)}(R_{i} + 1)(\xi(x_{(i)};\lambda))^{\alpha-1}} \log(\xi(x_{(i)};\lambda))[\alpha\log(\xi(x_{(i)};\lambda))+2],\\ \ell_{12}&=& -k\sum\limits_{i=1}^{m}{x_{(i)}^{2}(R_{i}+1)(\xi(x_{(i)};\lambda))^{\alpha-2}} [\alpha(\alpha - 1)\log(\xi(x_{(i)};\lambda))\!+(2\alpha - 1)],\\ \rho_{1}&=&\frac{a_{1}-1}{\alpha}-b_{1} \quad \text{and} \quad \rho_{2}=\frac{a_{2}-1}{\lambda}-b_{2}. \end{array} $$

Using Eq. 3.6, the approximate BE \( {\tilde {S}_{L}(t;\alpha ,\lambda )} \) of S(t;α, λ) is given by

$$ \begin{array}{@{}rcl@{}} {\tilde{S}_{L}(t;\alpha,\lambda)}&=&{{S}(t;\hat\alpha,\hat\lambda)} +\hat\rho_{1}(\hat{u}_{1}\hat\sigma_{11}+\hat{u}_{2}\hat\sigma_{21}) +\hat\rho_{2}(\hat{u}_{2}\hat\sigma_{22}+\hat{u}_{1}\hat\sigma_{12})\\ &&+0.5((\hat{u}_{11}\hat\sigma_{11}+2\hat{u}_{12}\hat\sigma_{12}+\hat{u}_{22}\hat\sigma_{22}) +\hat\ell_{30}\hat\sigma_{11}(\hat{u}_{1}\hat\sigma_{11}+\hat{u}_{2}\hat\sigma_{12})\\ &&+\hat\ell_{03}\hat\sigma_{22}(\hat{u}_{2}\hat\sigma_{22} + \hat{u}_{1}\hat\sigma_{21}) + \hat\ell_{21}(3\hat{u}_{1}\hat\sigma_{11}\hat\sigma_{12} + \hat{u}_{2}(\hat\sigma_{11}\hat\sigma_{22} + 2\hat\sigma_{12}^{2})) ) \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} u_{1}&=&\frac{\partial{S(t)}}{\partial\alpha}= -(1+\lambda{t})^{\alpha}\log(1+\lambda{t})\exp({1-(1+\lambda{t})^{\alpha}}),\\ u_{2}&=&\frac{\partial{S(t)}}{\partial\lambda}= -\alpha{t}(1+\lambda{t})^{\alpha-1}\exp({1-(1+\lambda{t})^{\alpha}}),\\ u_{11}&=&\frac{\partial^{2}{S(t)}}{\partial\alpha^{2}}= (1+\lambda{t})^{\alpha}(\log(1\!+\lambda{t}))^{2}((1+\lambda{t})^{2\alpha} - 1) \exp({1 - (1+\lambda{t})^{\alpha}}),\\ u_{22}&=&\frac{\partial^{2}{S(t)}}{\partial\lambda^{2}}= \alpha{t}^{2}(1+\lambda{t})^{\alpha-1} (\alpha(1+\lambda{t})^{\alpha-1}-(\alpha-1)\\&&\times(1+\lambda{t})^{-1}) \exp({1-(1+\lambda{t})^{\alpha}}), \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} u_{12}&=&u_{21}=\frac{\partial^{2}{S(t)}}{\partial\alpha\partial\lambda}=\frac{\partial^{2}{S(t)}}{\partial\lambda\partial\alpha}\\&=&{t}(1+\lambda{t})^{\alpha-1} (\alpha((1+\lambda{t})^{\alpha}-1)\log(1+\lambda{t})-1)\\ &&\times\exp({1-(1+\lambda{t})^{\alpha}}). \end{array} $$

Similarly, the approximate BE \( {\tilde {H}_{L}(t;\alpha ,\lambda )} \) of H(t;α, λ) is given by

$$ \begin{array}{@{}rcl@{}} {\tilde{H}_{L}(t;\alpha,\lambda)}&=&{{H}(t;\hat\alpha,\hat\lambda)} +\hat\rho_{1}(\hat{u}_{1}\hat\sigma_{11}+\hat{u}_{2}\hat\sigma_{21}) +\hat\rho_{2}(\hat{u}_{2}\hat\sigma_{22}+\hat{u}_{1}\hat\sigma_{12})\\ &&+0.5((\hat{u}_{11}\hat\sigma_{11}+2\hat{u}_{12}\hat\sigma_{12}+\hat{u}_{22}\hat\sigma_{22}) +\hat\ell_{30}\hat\sigma_{11}(\hat{u}_{1}\hat\sigma_{11}+\hat{u}_{2}\hat\sigma_{12})\\ &&+\hat\ell_{03}\hat\sigma_{22}(\hat{u}_{2}\hat\sigma_{22} + \hat{u}_{1}\hat\sigma_{21}) + \hat\ell_{21}(3\hat{u}_{1}\hat\sigma_{11}\hat\sigma_{12} + \hat{u}_{2}(\hat\sigma_{11}\hat\sigma_{22} + 2\hat\sigma_{12}^{2}))) \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} u_{1}&=&\frac{\partial{H(t)}}{\partial\alpha}= \lambda(1+\lambda{t})^{\alpha-1}(1+\alpha\log(1+\lambda{t})),\\ u_{2}&=&\frac{\partial{H(t)}}{\partial\lambda}= \alpha(1+\lambda{t})^{\alpha-1}(1+t\lambda(\alpha-1)(1+\lambda{t})^{-1}),\\ u_{11}&=&\frac{\partial^{2}{H(t)}}{\partial\alpha^{2}}= \lambda(1+\lambda{t})^{\alpha-1}\log(1+\lambda{t})(\alpha\log(1+\lambda{t})+2),\\ u_{22}&=&\frac{\partial^{2}{H(t)}}{\partial\lambda^{2}}= t\alpha(\alpha-1)(1+\lambda{t})^{\alpha-2}(t\lambda(\alpha-2)(1+\lambda{t})^{-1}+2), \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} u_{12}&=&u_{21}=\frac{\partial^{2}{H(t)}}{\partial\alpha\partial\lambda}=\frac{\partial^{2}{H(t)}}{\partial\lambda\partial\alpha}= (1+\lambda{t})^{\alpha-1}[ \alpha\log(1+\lambda{t})\\&&(t\lambda(\alpha-1)(1+\lambda{t})^{-1}+1)\\ &&+t\lambda(2\alpha-1)(1+\lambda{t})^{-1}+1]. \end{array} $$

Although the BEs, using the Lindley’s approximation method, of the unknown parameters of α and λ as well as S(t) and H(t) can be obtained easily, but unfortunately, we cannot construct the HPD credible intervals. For this purpose, the M-H algorithm can be used to generate MCMC samples from the posterior distribution and then to compute BEs and also to construct the associated HPD credible intervals.

3.2 M-H Algorithm

Importance sampling technique such as the M-H sampler algorithm is required for the implementations of MCMC methodology to generate a samples from Eq. 3.4. One may refer to Metropolis et al. (1953) and Hastings (1970) for various applications of this algorithm. The M-H algorithm is mainly used to simulate samples from probability distribution by making use of full joint density function and (independent) proposal distribution. To implement the M-H algorithm the full conditional posterior distributions of α and λ, from Eq. 3.4, are given, respectively, by

$$ \begin{array}{ll} \pi_{1}^{*}{(\alpha|\lambda,\underline{x})} &\propto {\alpha}^{m+a_{1}-1} \exp{\left( -\left( b_{1}\alpha-k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)} \right) \right)}\prod\limits_{i=1}^{m}{(\xi(x_{(i)};\lambda))^{\alpha-1}}, \end{array} $$
(3.7)

and

$$ \begin{array}{ll} \pi_{2}^{*}{(\lambda|\alpha,\underline{x})} &\propto {\lambda}^{m+a_{2}-1} \exp{\left( -\left( b_{2}\lambda-k\sum\limits_{i=1}^{m}{\omega(x_{(i)},R_{i};\alpha,\lambda)} \right) \right)}\prod\limits_{i=1}^{m}{(\xi(x_{(i)};\lambda))^{\alpha-1}}. \end{array} $$
(3.8)

The conditional posterior distributions (3.7) and (3.8) of the unknown parameters α and λ, respectively, cannot be reduced analytically to well-known distributions. Therefore, the M-H algorithm with normal proposal distribution is used to generate random samples from these distributions, in turn obtain the BEs and corresponding HPD credible intervals. The steps of M-H algorithm are carried out as follows:

Step 1: Start with an initial guess α(0) and λ(0).

Step 2: Set J = 1.Step 3: Generate α(J) and λ(J) from Eqs. 3.7 and 3.8 with normal proposal distributions \( N{(\alpha ^{(J-1)},\sigma _{\alpha }^{2})} \) and \( N{(\lambda ^{(J-1)},\sigma _{\lambda }^{2})} \), respectively, as

(a) Generate α from \( N{(\alpha ^{(J-1)},\sigma _{\alpha }^{2})} \) and λ from \( N{(\lambda ^{(J-1)},\sigma _{\lambda }^{2})} \).

(b) Evaluate the acceptance probabilities by \( \varphi _{\alpha }=\min \limits \left \{1, \frac {\pi _{1}^{*}{(\alpha ^{*}|\lambda ^{(J-1)},\underline {x})}} {\pi _{1}^{*}{(\alpha ^{(J-1)}|\lambda ^{(J-1)},\underline {x})}} \right \} \) and \( \varphi _{\lambda }=\min \limits \left \{1, \frac {\pi _{2}^{*}{(\lambda ^{*}|\alpha ^{(J)},\underline {x})}} {\pi _{2}^{*}{(\lambda ^{(J-1)}|\alpha ^{(J)},\underline {x})}} \right \} \).

(c) Generate a samples u1 and u2 from uniform distribution.

(d) If u1φα, accept the proposal and set α(J) = α, else set α(J) = α(J− 1).

(e) If u2φλ, accept the proposal and set λ(J) = λ, else set λ(J) = λ(J− 1).

Step 4: For a given time t, the BEs of SF and HRF, are given, respectively, by \( S^{(J)}(t)=\exp ({1-(1+\lambda ^{(J)}{t})^{\alpha ^{(J)}}}) \) and \( H^{(J)}(t)=\alpha ^{(J)}\lambda ^{(J)}(1+\lambda ^{(J)}{t})^{\alpha ^{(J)}-1} \).

Step 5: Set J = J + 1.

Step 6: Repeat Steps 3-5 for M times and obtain

$$ \begin{array}{@{}rcl@{}} \varphi^{(i)}=(\alpha^{(i)},\lambda^{(i)},S^{(i)}(t),H^{(i)}(t)),\ i=1,2,\dots,M. \end{array} $$

Step 7: To compute the HPD credible intervals of φ = (α, λ, S(t),H(t)) order the MCMC sample of \( \varphi ^{(i)},\ i=1,2,\dots ,M \), as \( (\varphi _{(1)},\varphi _{(2)},\dots ,\varphi _{(M)}) \). Using the algorithm proposed by Chen and Shao (1999), the 100(1 − γ)% HPD credible interval for φ is given by \( (\varphi _{(J^{*})}, \varphi _{(J^{*}+(1-\gamma )M)}) \), where J is chosen such that

$$ \begin{array}{@{}rcl@{}} \varphi_{(J^{*}+(1-\gamma)M)}-\varphi_{(J^{*})}=\min_{1\leq{i}\leq\gamma{M}}(\varphi_{(i+(1-\gamma)M)}-\varphi_{(i)}),\quad J^{*}=,1,2,\dots,M. \end{array} $$

Here [x] denotes the largest integer less than or equal to x. Hence, the HPD credible interval of φ is that interval which has the shortest length.

The first simulated varieties, M0, of the algorithm may be biased by the initial value, therefore, usually discarded in the beginning of the analysis implementation (burn-in period) in order to guarantee the convergence and to remove the affection of the selection of initial. Then the selected samples φ(i) for \( i=M_{0}+1,\dots ,M \), are sufficiently large can be used to develop the Bayesian inferences. Thus, the MCMC BEs of φ(α, λ), of any function of α and λ based on SEL function, as in Eq. 3.1, are given by \( {\tilde {\varphi }}_{M-H}={\sum }_{i=M_{0}+1}^{M}{{\varphi }}^{(i)}/(M-M_{0}) \), where M0 is burn-in.

4 Monte Carlo Simulation Study

In this section, a Monte Carlo simulation study is carried out in order to compare the proposed estimates of α, λ, S(t) and H(t). A large number N = 1000 of progressively first-failure censored samples for a true value of parameters α and λ, and different combinations of n(number of groups), m(progressively first-failure-censoring data), k(number of items within each group) are generated from the NHD by using the algorithm described in Balakrishnan and Sandhu (1995). In each case, the MLEs and BEs of the unknown parameters and reliability characteristics are computed. The Lindley’s approximation and MCMC methods based on SEL function are used for Bayesian computation purposes. Comparison between different estimators is made with respect to their MSE and RAB values. Also, the performances of the two-sided 95% ACI/HPD credible intervals are compared by the average lengths (ALs). The simulation study was performed for different values of (k, n, m) such as: n = 20(small), 50(medium) and 80(large) for each group size k = 2 and 5. The test is terminated when the number of failed subjects achieves or exceeds a certain value m, where the failure proportion m/n = 30, 60 and 90%. For each test, we compute the average estimates with their MSEs and RABs, which are given, respectively, by \( \bar {\hat \varphi }=\frac {1}{N}{\sum }_{i=1}^{N}{\hat \varphi _{i}} \), \( \text {MSE}{(\hat \varphi )}=\frac {1}{N}{\sum }_{i=1}^{N}{(\hat \varphi _{i}-\varphi )^{2}} \) and \( \text {RAB}{(\hat \varphi )}=\frac {1}{N}{\sum }_{i=1}^{N}{\frac {|\hat \varphi _{i}-\varphi |}{\varphi }} \), where \( \hat \varphi \) is the MLE or BE of the parametric function φ.

Based on the non-informative priors ai,bi for i = 1,2, the joint posterior distribution of the unknown two parameters is proportional to the likelihood function. Hence, we have used informative prior of α and λ with a1 = a2 = 0.1 and b1 = b2 = 1 when (α, λ) = (0.1,0.1). Here, the values of hyper-parameters are chosen to satisfy the prior mean become the expected value of the corresponding parameter. Here, the values of hyper-parameters are chosen to satisfy the prior mean become the expected value of the corresponding parameter. Further, we have also obtained the MLEs and BEs of the SF and HRF where the true values of S(t) and H(t), for specified time t = 5, are taken as S(5) = 0.959464 and H(5) = 0.0069425.

Using the M-H sampler algorithm proposed in Section 3, the different BEs based on 12,000 MCMC samples and discard the first 2000 values as “burn-in” are developed. For interval estimation, we compute the ALs of 95% ACI/HPD credible intervals of α, λ, S(t) and H(t). Now, we consider the following different sampling schemes (CSs) as

$$ \begin{array}{@{}rcl@{}} &&\text{CS-I}: R_{1}=n-m,\quad R_{i}=0\quad \text{for}\quad i \neq 1,\\ &&\text{CS-II}: \left\{ \begin{array}{lcccl} R_{m/2}=n-m,\quad R_{i}=0\quad \text{for}\quad i \neq {m/2},\quad \text{if}\quad m\quad \text{is even}, \\ R_{(m+1)/2}=n-m,\quad R_{i}=0\quad \text{for}\quad i \neq {(m+1)/2},\quad \text{if}\quad m\quad \text{is odd}, \end{array}\right.\\ &&\text{CS-III}: R_{m}=n-m,\quad R_{i}=0\quad \text{for}\quad i \neq m. \end{array} $$

Extensive computations were performed using \(\mathcal {R}\) statistical programming language software with mainly two useful statistical packages are ‘coda’ package proposed by Plummer et al. (2006), and ‘maxLik’ package, which using Newton-Raphson method of maximization in the computations, proposed by Henningsen and Toomet (2011). The average MLEs and BEs of α, λ, S(t) and H(t) are presented in Tables 12345 and 6. In addition, the ALs of α, λ, S(t) and H(t) are listed in Tables 7 and 8. From Tables 12345 and 6, it can be seen that the MLEs and BEs of unknown parameters and the reliability characteristics of NHD are very good in terms of MSEs and RABs. As n and m increases, the MSEs and RABs of all estimates decrease as expected. Moreover, as group size k increases, the MSEs and RABs associated with shape parameter α increase while that associated with scale parameter λ decrease. It is also observed that as the failure proportion m/n increases, the point estimates become even better. Also, the BEs using gamma informative prior are better as they include prior information than MLEs in terms of MSEs and RABs. Furthermore, the MCMC method using M-H algorithm is better than the Lindley’s approximation method in respect of both MSEs and RABs. For interval estimation, the HPD credible intervals are better than ACIs in respect of ALs. In each case, as the failure proportion m/n increases, the interval estimates become even better in respect of ALs. Also, as k increases, the ALs of ACIs increases while the ALs of HPD credible intervals narrows down. Therefore, we recommend the Bayesian point and interval estimation of the unknown parameters and reliability characteristics using M-H algorithm. Furthermore, in most cases, comparing the CS-I and CS-III, it is clear that the MSEs and RABs of the MLEs and BEs for the unknown parameters and reliability characteristics are greater for the CS-III than CS-I. This may not be surprising, because the expected duration of the experiments for CS-I is greater than the CS-III. Thus the data obtained by the CS-I would be expected to provide more information about the unknown parameters and reliability characteristics than the data obtained by CS-III.

Table 1 The average estimates of α with their MSEs and RABs, respectively (in parentheses), for different values of k, n and m
Table 2 The average estimates of λ with their MSEs and RABs, respectively (in parentheses), for different values of k, n and m
Table 3 The average estimates of S(5) with their MSEs and RABs, respectively (in parentheses), when k = 2 and different values of n and m
Table 4 The average estimates of S(5) with their MSEs and RABs, respectively (in parentheses), when k = 5 and different values of n and m
Table 5 The average estimates of H(5) with their MSEs and RABs, respectively (in parentheses), when k = 2 and different values of n and m
Table 6 The average estimates of H(5) with their MSEs and RABs, respectively (in parentheses), when k = 5 and different values of n and m
Table 7 Average lengths of 95% ACIs for the parameters and reliability characteristics, for different values of k, n and m
Table 8 Average lengths of 95% HPD credible intervals for the parameters and reliability characteristics, for different values of k, n and m

5 Optimal Censoring Plans

In the context of life-testing experiment, the choice of optimum censoring scheme from a class of all possible schemes is an important concern to the experimenter. Recently, choosing the optimal censoring scheme in different problems has received considerable attention in the statistical literature. See for example, Pradhan and Kundu (2009, 2013), Sultan et al. (2014), Dube et al. (2016) and Sen et al. (2018). Some of these criteria is the variance optimality used for one parameter case, but, the trace and determinant optimality are used for the multi-parameter case, intend to minimize the variance or the determinant of the V-C matrix of estimators under consideration. It is desirable for practical purposes to choose the optimum PFFCS from a class of possible censoring schemes. Thus, for given values of k, n, m and R, where \( {\sum }_{i=1}^{m}{R_{i}}=n-m \), we consider four criteria are commonly-used to compare between different censoring schemes, are reported in Table 9.

Table 9 Some useful optimum criteria

According to Criterion-I and Criterion-II, our goal is minimize the determinant and the trace, respectively, of V-C matrix (2.6) of the MLEs \( \hat {\alpha } \) and \( \hat {\lambda } \). In the case of one-parameter distributions, the comparison between different criteria can be easily made. But, if both the parameters are unknown, then the comparison of the two Fisher information matrices is not a trivial task, see Pradhan and Kundu (2013). Unfortunately, in presence of the shape and scale parameters, the Criterion-I and Criterion-II are not scale invariant, Gupta and Kundu (2006). However, in the case of multi-parameter distributions, there are other criteria which are scale invariant. Criterion-III depends on the choice of p, and it is tends to minimize the variance of logarithmic of MLE of the pth quantile \( (\ln (\hat {T}_{p})) \), where 0 < p < 1. Hence, the logarithmic for Tp of the NHD is given by

$$ \ln({T}_{p})=\ln\left( \frac{1}{\lambda} [(1-\ln(1-p))^{1/\alpha}-1] \right),\quad 0<p<1. $$
(5.1)

From Eq. 5.1, using the delta method, the variance of \( (\ln (\hat {T}_{p})) \) can be approximated by

$$ \begin{array}{@{}rcl@{}} \text{Var}(\ln(\hat{T}_{p}))={[\nabla{\ln(\hat{T}_{p})}]}^{\mathbf{T}} {\mathrm{I}_{0}^{-1}(\hat{\Theta})}{[\nabla{\ln(\hat{T}_{p})}]}, \end{array} $$

where \({[\nabla {\ln (\hat {T}_{p})}]}^{\mathbf {T}} =[\partial \nabla {\ln ({T}_{p})}/\partial {\alpha }, \partial \nabla {\ln ({T}_{p})}/\partial {\lambda }]_{({\alpha =\hat {\alpha }},{\lambda =\hat {\lambda }})} \) is the gradient of \( \ln ({T}_{p}) \) with respect to the unknown parameters α and λ.

From Table 9, the weight function w(p) ≥ 0 is a non-negative function satisfying \( {{\int \limits }_{0}^{1}}{w(p)}dp=1 \). Also, \( \ln (\hat {T}_{p}) \) the same as in Criterion-III. Without loss of generality, we take w(p) = 1, 0 < p < 1.

6 Real-life Data Analysis

In this section, a real data set is analyzed to illustrate how the proposed methodology can be applied in real life phenomenon. The considered data set is taken form Linhart and Zucchini (1986), which represents the failure times of the air conditioning system of an air-plane. The ordered data with are as follows: 1, 3, 5, 7, 11, 11, 11, 12, 14, 14, 14, 16, 16, 20, 21, 23, 42, 47, 52, 62, 71, 71, 87, 90, 95, 120, 120, 225, 246 and 261. Recently, this real data set was analyzed by Singh et al. (2015a, b). Before progressing further, we first fit the NHD to the complete data set and compare it’s fitting with three lifetime distributions namely, GD, WD and GED with PDFs, respectively, as

$$ \begin{array}{@{}rcl@{}} f(x;\alpha,\lambda)&=&\frac{\lambda^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\lambda{x}},\quad x>0,\ \alpha,\lambda>0,\\ f(x;\alpha,\lambda)&=&\alpha\lambda{x}^{\alpha-1}e^{-\lambda{x}^{\alpha}},\quad x\geq0,\ \alpha,\lambda>0, \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} f(x;\alpha,\lambda)=\alpha\lambda(1-{e}^{-\lambda{x}})^{\alpha-1}{e}^{-\lambda{x}},\quad x>0,\ \alpha,\lambda>0, \end{array} $$

where α and λ are the shape and scale parameters, respectively.

One question arises about whether the data fit the GD, WD, GED and NHD or not. To check the validity of proposed model, the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) goodness-of-fit test statistics with associated p-value are considered. Also, the negative log-likelihood function \( -\hat {\ell } \) evaluated at the MLEs, Akaike information criterion (AIC), \( \text {AIC}=-2\hat {\ell }+2S \), corrected Akaike information criterion (AICc), AICc = AIC + (2nS/nS − 1), consistent Akaike information criterion (CAIC), \( \text {CAIC}=-2\hat {\ell }+(2nS/n-S-1) \), Bayesian information criteria (BIC), \( \text {BIC}=-2\hat {\ell }+S\log (n) \), and Hannan-Quinn information criterion (HQIC), \( \text {HQIC}=-2\hat {\ell }+2S\log (\log (n)) \), where n and S are the sample size and the number of model parameters, respectively, are considered. In general, the best distribution corresponds to the lowest of \( -\hat {\ell } \), AIC, AICc, CAIC, BIC, HQIC, K-S and A-D statistic values and highest p-values. The values of MLEs of the model parameters and corresponding goodness-of-fit measures are reported in Table 10. The statistic value of K-S and A-D goodness-of-fit tests with its p-values are listed in Table 11. Also, we use a graphical method for goodness-of-fit of distributions. We draw quantile-quantile (Q-Q) plots of the GD, WD, GED and NHD, which are shown in Fig. 1. A Q-Q plot depicts the points \( \{F^{-1}((i-0.5)/n;\hat {\theta }),x_{(i)}\},\ i=1,2,\dots ,n, \) where \( \hat {\theta } \) is the MLE of 𝜃.

Table 10 Summary fit for the real data set
Table 11 Goodness-of-fit test statistics
Figure 1
figure 1

Q-Q plots of GD, WD, GED and NHD reliability models

From Tables 10 and 11, it can be seen that the NHD is the best choice among the competing GD, WD and GED in the literature for fitting lifetime real data, since it has the smallest goodness of statistic values and highest p-values. Furthermore, the Q–Q plots support the above findings. For more fitting illustration, we have also provided two plots computed at the estimated parameters. First plot is the empirical cumulative distribution function plot and fitted the CDF of GD, WD, GED and NHD. Second plot is the histogram of the real data and the fitted PDF of GD, WD, GED and NHD, as in Fig. 2. To illustrate the inferential methods developed in the preceding sections, we generate a first-failure censored sample after randomly grou** this data set into 15 groups with k = 2 items within each group and report it in Table 12.

Figure 2
figure 2

Fitted the CDF and PDF of GD, WD, GED and NHD for the real data set

Table 12 Random grou** for the real data set

The calculated values of the four criteria are reported in Table 16. From Table 16, we observe that scheme R1 is the optimal scheme than schemes R2 and R3 for criteria I, II and IV. Moreover, censoring scheme R2 is the optimal for Criterion-II. Then, the first-failure censored sample is obtained in order as: 1, 3, 5, 7, 11, 12, 14, 14, 14, 16, 16, 23, 42, 90 and 120. Using the first-failure censored data, three different progressive first-failure censored samples, using three different schemes with k = 2, n = 15 and m = 8 are generated and reported in Table 13. For brevity, the censoring scheme R = (2,0,0,0,2) is denoted by R = (2,03,2). Using Table 13, the MLEs and the BEs of the unknown parameters α and λ, as well as, the reliability characteristics S(t) and H(t) at given mission time t = 10, are computed and listed in Table 14. The BEs are developed using Lindley’s approximation and MCMC methods under with a non-informative prior. Now, we generate 11000 MCMC samples and then first 1000 iterations (burn-in period) have been discarded from the generated sequence as has been suggested in Section 3. The initial values for the unknown parameters α, λ, S(t) and H(t) for running the MCMC sampler algorithm were taken to be their maximum likelihood estimates. Moreover, two-sided 95% ACIs and HPD credible intervals are listed in Table 15. Now, we illustrate the concept of optimal censoring based on the four criteria are considered in Table 9. Using the three data sets in Table 13, Criterion-I and Criterion-II can be obtained by compute the determinant and trace of the observed V-C matrix, respectively. Based on four different quantiles namely: p = 0.50 and p = 0.99, the Criterion-III and Criterion-IV are computed.

Table 13 Three different progressive first-failure censored data sets
Table 14 The MLEs and BEs of α, λ, S(t) and H(t), for different censoring schemes
Table 15 Two-sided 95% ACIs/HPD credible intervals of α, λ, S(t) and H(t)
Table 16 Optimal censoring scheme for different criteria, when (k, n, m) = (2,15,8)

7 Concluding Remarks

In this paper, the problem of estimating the unknown parameters and reliability characteristics of the NHD when the data are collected under the PFFCS is discussed. The MLEs and BEs of the unknown parameters and reliability characteristics are obtained. These estimates cannot be obtained in closed forms, but can be derived numerically. Using gamma informative priors under SEL function, the BEs using the Lindley’s approximation and MCMC methods are developed. Based on the asymptotic normality of the MLEs and delta method, the 95% ACIs of the parameters and reliability characteristics are constructed. Since the Lindley’s approximation method cannot construct the HPD credible intervals, we made use of the M-H algorithm to obtain point estimates and associated HPD credible intervals. Moreover, the results of Singh et al. (2015a) if putting k = 1 and the results of Singh et al. (2015b) if putting \( \mathbf {R}=(0,0,\dots ,0) \) and k = 1 may be obtained as a special cases from the new results. It is important for a reliability practitioner to know which censoring scheme would be optimum for the life-testing experiment, we have provided the optimal censoring scheme based on four criteria measures. We hope that the results and methodology discussed in this paper will be beneficial to data analyst and reliability practitioners.