1 Introduction

In this paper, we consider a stock-rationing queueing problem of a warehouse with one type of products and two classes of demands, which may be viewed as coming from retailers with two different priority levels. Now, such a stock-rationing warehouse system becomes more and more important in many large cities under the current COVID-19 environment. For example, Bei**g has seven super-large warehouses, which always supply various daily necessities, such as vegetables, meat, eggs, seafood to more than 40 million people every day. In the warehouses, each type of daily necessities are supplied by lots of different companies in China and the other countries, which lead to that the successive supply stream of each type of products can be well described as a Poisson process. In addition, the two retailers may be regarded as a large supermarket group and a community retail store group, respectively. Typically, the large supermarket group has a higher supply priority than the community retail store group. When the COVID-19 is at a serious warning in Bei**g, the stock-rationing management of the warehouses plays a key role in strengthening the fine management of the warehouses such that every family at Bei**g can have a very comprehensive life guarantee.

From the perspective of practical applications, such a stock-rationing queueing problem with multiple demand classes can always be encountered in many different real areas, for example, assemble-to-order systems, make-to-stock queues and multiechelon inventory systems by Ha (1997a); manufacturing by Zhao et al. (2005); airline by Wang et al. (2021); rental business by Altug and Ceryan (2021) and Jain et al. (2015); health care by Papastavrou et al. (2014) and Baron et al. (2019); and so forth. All the studies above show that the stock-rationing queues with multiple demand classes are not only necessary and important in many practical applications, but also have their own theoretical interest.

In the stock-rationing queueing systems, the stock rationing policies always assign different supply priorities to multiple classes of demands. In the early literature, the so-called critical rationing level was imagined intuitively, and its existence was further proved by Veinott (1965) and Topkis (1968). Once the critical rationing level is given and the on-hand inventory falls below the critical rationing level, a low priority demand may be either rejected, back-ordered or discarded such that the left on-hand inventory will be reserved to supply the future high priority demands. Thus designing and optimizing the critical rationing levels becomes a basic management way of inventory rationing across multiple demand classes. So far, analysis of the critical rationing levels has been interesting but difficult and challenging in the study of stock-rationing queues with multiple demand classes.

Some studies have applied the MDPs to discuss inventory rationing and stock-rationing queues across multiple demand classes by means of the submodular (or supermodular) technique, among which important examples include Ha (1997a), Ha (1997b), Ha (2000), Gayon et al. (2009), Benjaafar and ElHafsi (2006) and Nadar et al. (2014). To this end, it is a key that the structural properties of the optimal rationing policy need to be identified by using a set of structured value functions that is preserved under an optimal operator. Based on this, the optimal rationing policy of the inventory rationing across multiple demand classes can be further described and expressed by the structural properties. In many more general cases, it is not easy and even very difficult to set up the structural properties of the optimal rationing policy. For this reason, some stronger model assumptions have to be further added to guarantee the existence of structural properties of the optimal policy. To improve the applicability of the MDPs, we propose a new algebraic method to find a complete algebraic solution to the optimal rationing policy by means of the sensitivity-based optimization.

The sensitivity-based optimization may be regarded as a new research branch of the MDPs, which grows out of infinitesimal perturbation analysis of discrete event dynamic systems, e.g., see Cao (2007). Note that one key of the sensitivity-based optimization is to set up and use the so-called performance difference equation, which is based on the perturbation realization factor as well as the performance potential related to the Poisson equation. To the best of our knowledge, this paper is the first to apply the sensitivity-based optimization to study the stock-rationing queues with multiple demand classes.

Based on the above analysis, we summarize the main contributions of this paper as follows:

  1. (1)

    A complete algebraic solution This paper develops a complete algebraic solution to the optimal dynamic rationing policy of the stock-rationing queue by means of the sensitivity-based optimization, and shows that the optimal dynamic rationing policy must be of transformational threshold type, which can lead to refining three sufficient conditions under each of which the optimal dynamic rationing policy is of threshold type. In addition, it is worthwhile to note that our transformational threshold type results are sharper than the bang-bang control given in Ma et al. (2019), Ma et al. (2021) and **a et al. (2021). Therefore, our algebraic method provides not only a necessary complement of policy spatial structural integrity but also a new way of optimality proof when comparing to the frequently-used submodular (or supermodular) technique of MDPs. Also, the complete algebraic solution to the optimal dynamic rationing policy can provide more effective support for numerical computation of the optimal policy and the optimal profit of this system.

  2. (2)

    A unified computational framework To the best of our knowledge, this paper is the first to apply the sensitivity-based optimization to analyze the stock-rationing queues with multiple demand classes. It is necessary and useful to describe the three key steps: (a) Setting up a policy-based Markov process. (b) Constructing a policy-based Poisson equation, whose general solution can be used to characterize the monotonicity and optimality of the long-run average profit of this system. (c) Finding the optimal dynamic rationing policy in the three different areas of the penalty cost. In addition, the computational framework can sufficiently support numerical solution of stock-rationing queues with multiple demand classes while the submodular (or supermodular) technique of MDPs is very difficult to deal with more general stock-rationing queues.

  3. (3)

    Two different methods can sufficiently support each other Our algebraic method sets up a complete algebraic solution to the optimal dynamic rationing policy, thus it can provide not only a necessary complement of policy spatial structural integrity but also a new way of optimality proof when comparing to the frequently-used submodular (or supermodular) technique of MDPs. On the other hand, since our algebraic method and the submodular (or supermodular) technique are all important parts of the MDPs (the former is to use the poisson equations; while the latter is to apply the optimality equation), it is clear that the two different methods will sufficiently support each other in the study of inventory rationing and stock-rationing queues with multiple demand classes.

The remainder of this paper is organized as follows. Section 2 provides a literature review. Section 3 gives model description for the stock-rationing queue with two demand classes. Section 4 establishes an optimization problem to find the optimal dynamic rationing policy, in which we set up a policy-based birth-death process and define a more general reward function. Section 5 establishes a policy-based Poisson equation and provides its general solution with two free constants. Section 6 provides an explicit expression for the perturbation realization factor, and discusses the solution of the linear equation in the penalty cost. Section 7 discusses the monotonicity and optimality of the long-run average profit of this system, and finds the optimal dynamic rationing policy in three different areas of the penalty cost. Section 8 analyzes the stock-rationing queue under a threshold type (statical) rationing policy. Section 9 uses numerical experiments to demonstrate our theoretical results of the optimal dynamic rationing policy. Finally, some concluding remarks are given in Sect. 10.

2 Literature review

The inventory rationing across multiple demand classes was first analyzed by Veinott (1965) in the context of inventory control theory. From then on, some authors have discussed the inventory rationing problems. Readers may refer to a book by Möllering (2007); survey papers by Li et al. (2019a); and a research classification by Teunter and Haneveld (2008), Möllering and Thonemann (2008), Van Foreest and Wijngaard (2014) and Alfieri et al. (2017).

In the inventory rationing system, a critical rationing level was imagined from early research and practical experience. Veinott (1965) first proposed such a critical rationing level; while Topkis (1968) proved that the critical rationing level really exists and it is an optimal policy. It is a most basic problem how to mathematically prove whether a rationing inventory system has such a critical rationing level. Ha (1997a) made a breakthrough by applying the MDPs to analyze the inventory rationing policy for a stock-rationing queue with exponential production times, Poisson demand arrivals, lost sales and multiple demand classes.

Since the seminal work of Ha (1997a), it has been interesting to extend and generalize the way to apply the MDPs to deal with the stock-rationing queues and the rationing inventory systems. Important examples include the Erlang production times by Ha (2000) and Gayon et al. (2009); the backorders with two demand classes by Ioannidis et al. (2021); the capacity allocation by Shen and Yu (2019); omni-channel retailing by Goedhart et al. (2022); the batch ordering by Huang and Iravani (2008), the batch production by Pang et al. (2014); the utilization of information by Gayon et al. (2009); an assemble-to-order production system by Elhafsi et al. (2015); ElHafsi et al. (2018) and Nadar et al. (2014); supply chain by van Wijk et al. (2019); dynamic price by Ding et al. (2016), Schulte and Pibernik (2017); and so forth.

In the inventory rationing literature, there exist two kinds of rationing policies: The static rationing policy and the dynamic rationing policy. Note that the dynamic rationing policy allows a threshold rationing level to change in time, depending on the number and ages of outstanding orders. In general, the static rationing policy is possible to miss some chances to further improve system performance, while the dynamic rationing policy should reflect better by means of various continuously updated information, the system performance can be improved dynamically. Deshpande et al. (2003) indicated that the optimal dynamic rationing policy may significantly reduce the inventory cost compared with the static rationing policy.

If there exist multiple replenishment opportunities, then the ordering policies are taken as two different types: Continuous review and periodic review. Therefore, our literature analysis for inventory rationing focuses on four different classes through combining the rationing policy (static vs. dynamic) with the inventory review (continuous vs. periodic) as follows: Static-continuous, static-periodic, dynamic-continuous and dynamic-periodic.

2.1 The static rationing policy (periodic vs. continuous)

The periodic review Veinott (1965) is the first to introduce an inventory rationing across different demand classes and propose a critical rationing level (i.e., the static rationing policy) in a periodic review inventory system with backorders. Subsequent research further investigated the periodic review inventory system with multiple demand classes, for example, the (sS) policy by Cohen et al. (1988) and Tempelmeier (2006); the \((S-1,S)\) policy by Ha (1997a, 1997b); the lost sales by Dekker et al. (2002); the backorders by Möllering and Thonemann (2008); and the anticipated critical levels by Wang et al. (2013).

The continuous review Nahmias and Demmy (1981) is the first to propose and develop a constant critical level \(\left( Q,r,{{\textbf{C}}} \right) \) policy in a continuous review inventory model with multiple demand classes, where Q is the fixed batch size, r is the reorder point and \({{\textbf{C}}}=\left( C_{1},C_{2},\ldots ,C_{n-1}\right) \) is a set of critical rationing levels for n demand classes. From that time on, some authors have discussed the constant critical level \(\left( Q,r,{{\textbf{C}}}\right) \) policy in continuous review inventory systems. Readers may refer to recent publications for details, among which are Melchiors et al. (2000), Dekker et al. (1998), Deshpande et al. (2003), Isotupa (2006), Arslan et al. (2007), Möllering and Thonemann (2008, 2010) and Escalona et al. (2015, 2017). In addition, the \(\left( S-1,S,{{\textbf{C}}}\right) \) inventory system was discussed by Dekker et al. (2002), Kranenburg and van Houtum (2007) and so on.

2.2 The dynamic rationing policy (continuous vs. periodic)

The continuous review Topkis (1968) is the first to analyze the dynamic rationing policy and indicate that the optimal rationing policy is a dynamic policy. Melchiors (2003) considered a dynamic rationing policy in a (sQ) inventory system with a key assumption that there was at most one outstanding order. Teunter and Haneveld (2008) developed a continuous time approach to determine the dynamic rationing policy for two Poisson demand classes, analyzed the marginal cost to determine the optimal remaining time for each rationing level, and expressed the optimal threshold policy through a schematic diagram or a lookup table. Fadıloğlu and Bulut (2010) proposed a dynamic rationing policy: Rationing with Exponential Replenishment Flow (RERF), for continuous review inventory systems with either backorders or lost sales. Wang et al. (2013) developed a dynamic threshold mechanism to allocate backorders when the multiple outstanding orders for different demand classes exist for the (QR) inventory system.

The periodic review: For the dynamic rationing policy in a periodic review inventory system, readers may refer to, such as two demand classes by Sobel and Zhang (2001), Frank et al. (2003) and Tan et al. (2009); dynamic critical levels and lost sales by Haynsworth and Haynsworth and Price (1989); multiple demand classes by Hung and Hsiao (2013); two backorder classes by Chew et al. (2013); general demand processes by Hung et al. (2012); mixed backorders and lost sales by Wang and Tang (2014); uncertain demand and production rates by Turgay et al. (2015); and incremental upgrading demands by You (2003).

3 Model description

In this section, we describe a stock-rationing queue with two demand classes, in which a single class of products are supplied to stock at a warehouse, and the two classes of demands come from two retailers with different priorities. In addition, we provide system structure, operational mode and mathematical notations.

A stock-rationing queue The warehouse has the maximal capacity N to stock a single class of products, and the warehouse needs to pay a holding cost \(C_{1}\) per product per unit time. There are two classes of demands to order the products, in which the demands of Class 1 have a higher priority than that of Class 2, such that the demands of Class 1 can be satisfied in any non-zero inventory; while the demands of Class 2 may be either satisfied or refused based on the inventory level of the products. Figure  depicts a simple physical system to understand the stock-rationing queue.

Fig. 1
figure 1

A stock-rationing queue with two demand classes

The supply process The supply stream of the products to the warehouse is a Poisson process with arrival rate \(\lambda \), where the price of per product is \(C_{3}\) paid by the warehouse to the external product supplier. If the warehouse is full of the products, then any new arriving product has to be lost. In this case, the warehouse will have an opportunity cost \(C_{4}\) per product rejected into the warehouse.

The service processes The service times provided by the warehouse to satisfy the demands of Classes 1 and 2 are i.i.d. and exponential with service rates \(\mu _{1}\) and \(\mu _{2}\), respectively. The service disciplines for the two classes of demands are all First Come First Serve (FCFS). The warehouse can obtain the service price R when one product is sold to Retailer 1 or 2. Note that each demand of Class 1 or 2 is satisfied by one product every time.

The stock-rationing rule For the two classes of demands, each demand of Class 1 can always be satisfied in any non-zero inventory; while for satisfying the demands of Class 2, we need to consider three different cases as follows:

Case one: The inventory level is zero. In this case, there is no product in the warehouse. Thus any new arriving demand has to be rejected immediately. This leads to the the lost sales cost \(C_{2,1}\) (resp. \(C_{2,2}\)) per unit time for any lost demand of Class 1 (resp. 2). We assume that \(C_{2,1}>\) \(C_{2,2}\), which is used to guarantee the higher priority service for the demands of Class 1 when comparing to the lower priority for the demands of Class 2.

Case two: The inventory level is low. In this case, the number of products in the warehouse is not more than a key threshold K, where the threshold K is subjectively designed by means of some real experience. Note that the demands of Class 1 have a higher priority to receive the products than the demands of Class 2. Thus the warehouse will not provide any product to satisfy the demands of Class 2 under an equal service condition if the number of products in the warehouse is not more than K. Otherwise, such a service priority is violated (i.e., the demands of Class 2 are satisfied from a low stock), so that the warehouse must pay a penalty cost P per product supplied to the demands of Class 2 at a low stock. Note that the penalty cost P measures different priority levels to provide the products between the two classes of demands.

Case three: The inventory level is high. In this case, the number of products in the warehouse is more than the threshold K. Thus the demands of Classes 1 and 2 can be simultaneously satisfied due to enough products in the warehouse.

Independence We assume that all the random variables defined above are independent of each other.

In what follows, we use Table to further summarize some above notations.

Table 1 Some costs and prices in the stock-rationing queue

Remark 1

The penalty cost P is a necessary variable (setting up conditions of control classification) to dynamically control and optimize that the products are supplied to the demands of Class 2 at a low stock, and specifically, when the stock is not empty. While the lost sales costs, satisfying \(C_{2,1}>C_{2,2}\), are introduced to provide some static and not enough penalties for that the products are supplied to the demands of Class 2 at a low stock.

4 Optimization model formulation

In this section, we establish an optimization problem to find the optimal dynamic rationing policy in the stock-rationing queue. To do this, we set up a policy-based birth-death process, and define a more general reward function with respect to both states and policies of the policy-based birth-death process.

To study the stock-rationing queue with two demand classes, we first need to define both ‘states’ and ‘policies’ to express stochastic dynamics of the stock-rationing queue.

Let I(t) be the number of products in the warehouse at time t. Then it is regarded as the state of this system at time t. Obviously, all the cases of State I(t) form a state space as follows:

$$\begin{aligned} \varvec{\Omega }=\{0,1,2,\ldots ,N\}. \end{aligned}$$

Also, State \(i\in \) \(\varvec{\Omega }\) is regarded as an inventory level of this system.

From the states, some policies are defined with a little bit more complicated. Let \(d_{i}\) be a policy related to State \(i\in \) \(\varvec{\Omega }\), and it expresses whether or not the warehouse prefers to supply some products to the demands of Class 2 when the inventory level is not more than the threshold K for \(0<K\le N\). Thus, we have

$$\begin{aligned} d_{i}=\left\{ \begin{array}{ll} 0, &{} i=0,\\ 0,1, &{} i=1,2,\ldots ,K,\\ 1, &{} i=K+1,K+2,\ldots ,N, \end{array} \right. \end{aligned}$$
(1)

where \(d_{i}=0\) and 1 represents that the warehouse rejects and satisfies the demands of Class 2, respectively. Obviously, not only does the policy \(d_{i}\) depend on State \(i\in \) \(\varvec{\Omega }\), but also it is controlled by the threshold K. Of course, for a special case, if \(K=N\), then \(d_{i} \in \left\{ 0,1\right\} \) for \(1\le i\le N\).

Corresponding to each state in \(\varvec{\Omega }\), we define a time-homogeneous policy of the stock-rationing queue as

$$\begin{aligned} {{\textbf{d}}}=(d_{0};d_{1},d_{2},\ldots ,d_{K};d_{K+1},d_{K+2},\ldots ,d_{N}). \end{aligned}$$

It follows from (1) that

$$\begin{aligned} {{\textbf{d}}}=(0;d_{1},d_{2},\ldots ,d_{K};1,1,\ldots ,1). \end{aligned}$$
(2)

Thus Policy \({{\textbf{d}}}\) depends on \(d_{i}\in \left\{ 0,1\right\} \), which is related to State i for \(1\le i\le K\). Let all the possible policies of the stock-rationing queue, given in (2), form a policy space as follows:

$$\begin{aligned} {\mathcal {D}}=\left\{ {{\textbf{d}}}:{{\textbf{d}}}=(0;d_{1},d_{2},\ldots ,d_{K};1,1,\ldots ,1),d_{i}\in \left\{ 0,1\right\} ,1\le i\le K\right\} . \end{aligned}$$

Remark 2

In general, the threshold K is subjective and is designed by means of the real experience of the warehouse manager. If \(K=N\), then the policy is expressed as

$$\begin{aligned} {{\textbf{d}}}=(0;d_{1},d_{2},\ldots ,d_{N}). \end{aligned}$$

Thus our K-based policy \({{\textbf{d}}}=(0;d_{1},d_{2},\ldots ,d_{K};1,1,\ldots ,1)\) is more general than Policy \({{\textbf{d}}}=(0;d_{1},d_{2},\ldots ,d_{N})\).

Let \(I^{({{\textbf{d}}})}(t)\) be the state of the stock-rationing queue at time t under any given policy \({{\textbf{d}}}\in {\mathcal {D}}\). Then \(\left\{ I^{({{\textbf{d}}})}(t):t\ge 0\right\} \) is a continuous-time policy-based Markov process on the state space \(\varvec{\Omega }\) whose state transition relations are depicted in Fig. .

Fig. 2
figure 2

State transition relations of the policy-based Markov process

It is easy to see from Fig. 2 that \(\left\{ I^{({{\textbf{d}}})}(t):t\ge 0\right\} \) is a policy-based birth-death process. Based on this, the infinitesimal generator is given by

$$\begin{aligned} {{\textbf{B}}}^{({{\textbf{d}}})}\!=\!\!\left( \! \begin{array}{cccccc} -\lambda &{} \lambda &{} &{} &{} &{} \\ v\left( d_{1}\right) &{} -\left[ \lambda \!+\!v\left( d_{1}\right) \right] &{} \lambda &{} &{} &{} \\ \text { }\ddots &{} \text { }\ddots \text { } &{} \text { }\ddots \text { } &{} &{} &{} \\ &{} v\left( d_{K}\right) &{} -\left[ \lambda \!+\!v\left( d_{K}\right) \right] &{} \lambda &{} &{} \\ &{} &{} v\left( 1\right) &{} -\left[ \lambda \!+\!v\left( 1\right) \right] &{} \lambda &{} \\ &{} &{} \text { }\ddots &{} \text { }\ddots &{} \text { }\ddots &{} \\ &{} &{} &{} v\left( 1\right) &{} -\left[ \lambda \!+\!v\left( 1\right) \right] &{} \lambda \\ &{} &{} &{} &{} v\left( 1\right) &{} -v\left( 1\right) \end{array} \!\right) \! \!,\! \end{aligned}$$
(3)

where \(v\left( d_{i}\right) =\mu _{1}+d_{i}\mu _{2}\) for \(i=1,2,\ldots ,K\), and \(v\left( 1\right) =\mu _{1}+\mu _{2}.\) It is clear that \(v\left( d_{i}\right) >0\) for \(i=1,2,\ldots ,K\). Thus the policy-based birth-death process \({{\textbf{B}}}^{({{\textbf{d}}})}\) must be irreducible, aperiodic and positive recurrent for any given policy \({{\textbf{d}}}\in {\mathcal {D}}\). In this case, we write the stationary probability vector of the policy-based birth-death process \(\left\{ I^{({{\textbf{d}}})}(t):t\ge 0\right\} \) as

$$\begin{aligned} {\pi }^{({{\textbf{d}}})}=\left( \pi ^{({{\textbf{d}}})}(0);\pi ^{({{\textbf{d}}} )}(1),\ldots ,\pi ^{({{\textbf{d}}})}(K);\pi ^{({{\textbf{d}}})}(K+1),\ldots ,\pi ^{({{\textbf{d}}})}(N)\right) . \end{aligned}$$
(4)

Obviously, the stationary probability vector \({\pi }^{({{\textbf{d}}})}\) is the unique solution to the system of linear equations: \({\pi }^{({{\textbf{d}}})}{{\textbf{B}}}^{({{\textbf{d}}})}={{\textbf{0}}}\) and \({\pi }^{({{\textbf{d}}})}{{\textbf{e}}}=1\), where \({{\textbf{e}}}\) is a column vector of ones with a suitable dimension. We write

$$\begin{aligned} \xi _{0}&=1,\text { } i=0,\nonumber \\ \xi _{i}^{({{\textbf{d}}})}&=\left\{ \begin{array}{ll} \frac{\lambda ^{i}}{\prod \limits _{j=1}^{i}v\left( d_{j}\right) }, &{} i=1,2,\ldots ,K,\\ \frac{\lambda ^{i}}{\left( \mu _{1}+\mu _{2}\right) ^{i-K}\prod \limits _{j=1} ^{K}v\left( d_{j}\right) }, &{} i=K+1,K+2,\ldots ,N, \end{array} \right. \end{aligned}$$
(5)

and

$$\begin{aligned} h^{({{\textbf{d}}})}=1+\sum \limits _{i=1}^{N}\xi _{i}^{({{\textbf{d}}})}. \end{aligned}$$

It follows from Subsection 1.1.4 of Chapter 1 in Li (2010) that

$$\begin{aligned} \pi ^{({{\textbf{d}}})}\left( i\right) =\left\{ \begin{array}{ll} \frac{1}{h^{({{\textbf{d}}})}}, &{} i=0\\ \frac{1}{h^{({{\textbf{d}}})}}\xi _{i}^{({{\textbf{d}}})}, &{} i=1,2,\ldots ,N. \end{array} \right. \end{aligned}$$
(6)

By using the policy-based birth-death process \({{\textbf{B}}}^{({{\textbf{d}}})}\), now we define a more general reward function in the stock-rationing queue. It is seen from Table 1 that the reward function with respect to both states and policies is defined as a profit rate (i.e. the total system revenue minus the total system cost per unit time). By observing the impact of Policy \({{\textbf{d}}}\) on the profit rate, the reward function at State i under Policy \({{\textbf{d}}}\) is given by

$$\begin{aligned} f^{({{\textbf{d}}})}\left( i\right)&=R\left( \mu _{1}1_{\left\{ i>0\right\} }+\mu _{2}d_{i}\right) -C_{1}i-C_{2,1}\mu _{1}1_{\left\{ i=0\right\} }-C_{2,2}\mu _{2}\left( 1-d_{i}\right) \nonumber \\&\quad \text { } \text { }-C_{3}\lambda 1_{\left\{ i<N\right\} } -C_{4}\lambda 1_{\left\{ i=N\right\} }-P\mu _{2}d_{i}1_{\left\{ 1\le i\le K\right\} }, \end{aligned}$$
(7)

where, \(1_{\left\{ \cdot \right\} }\) represents the indicator function whose value is one when the event occurs; otherwise it is zero. By using the indicator function, satisfying and rejecting the demands of Class 1 are expressed as \(1_{\{i>0\}}\) and \(1_{\{i=0\}}\), respectively; the external products enter or are lost by the warehouse according to \(1_{\{i<N\}}\) and \(1_{\{i=N\}}\), respectively; and a penalty cost paid by the warehouse is denoted by means of \(1_{\left\{ 1\le i\le K\right\} }\) due to that the warehouse supplies the products to the demands of Class 2 at a low stock.

For the convenience of readers, it is necessary and useful to explain the reward function (7) from four different cases as follows:

Case (a): For \(i=0\),

$$\begin{aligned} f\left( 0\right) =-C_{2,1}\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda . \end{aligned}$$
(8)

In Case (a), there is no product in the warehouse, thus it has to reject any demand of Classes 1 and 2.

Case (b): For \(1\le i\le K\),

$$\begin{aligned} f^{({{\textbf{d}}})}\left( i\right) =R\left( \mu _{1}+\mu _{2}d_{i}\right) -C_{1}i-C_{2,2}\mu _{2}\left( 1-d_{i}\right) -C_{3}\lambda -P\mu _{2}d_{i}. \end{aligned}$$
(9)

In Case (b), since the inventory level is low for \(1\le i\le K\), the penalty cost is paid by the warehouse when it supplies the products to the demands of Class 2.

Differently from Cases (a) and (b), the inventory level is high for \(K+1\le i\le N\) in Cases (c) and (d), thus it can synchronously satisfy the demands of Classes 1 and 2.

Case (c): For \(K+1\le i\le N-1\),

$$\begin{aligned} f\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda . \end{aligned}$$
(10)

Case (d): For \(i=N\),

$$\begin{aligned} f\left( N\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}N-C_{4}\lambda . \end{aligned}$$
(11)

Note that \(C_{3}\) is the price per product paid by the warehouse to the external product supplier; while \(C_{4}\) is the opportunity cost per product rejected into the warehouse.

Based on the above analysis, we define an \(\left( N+1\right) \)-dimensional column vector composed of the elements \(f\left( 0\right) ,\) \(f^{({{\textbf{d}}} )}\left( i\right) \) for \(1\le i\le K\), and \(f\left( j\right) \) for \(K+1\le j\le N\) as follows:

$$\begin{aligned} {{\textbf{f}}}^{({{\textbf{d}}})}=\left( f\left( 0\right) ;f^{({{\textbf{d}}})}\left( 1\right) ,f^{({{\textbf{d}}})}\left( 2\right) ,\ldots ,f^{({{\textbf{d}}})}\left( K\right) ;f\left( K+1\right) ,f\left( K+2\right) ,\ldots ,f\left( N\right) \right) ^{T}. \end{aligned}$$
(12)

Now, we consider the long-run average profit of the stock-rationing queue (or the continuous-time policy-based birth-death process \(\left\{ I^{({{\textbf{d}}} )}(t):t\ge 0\right\} \)) under any given policy \({{\textbf{d}}}\). Let

$$\begin{aligned} \eta ^{{{\textbf{d}}}}=\lim _{T\rightarrow \infty }E\left\{ \frac{1}{T}\int _{0} ^{T}f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) dt\right\} . \end{aligned}$$

Then

$$\begin{aligned} \eta ^{{{\textbf{d}}}}={\pi }^{({{\textbf{d}}})}{{\textbf{f}}}^{({{\textbf{d}}})}, \end{aligned}$$
(13)

where \({\pi }^{({{\textbf{d}}})}\) and \({{\textbf{f}}}^{({{\textbf{d}}})}\) are given by (4) and (12), respectively.

To further observe the long-run average profit \(\eta ^{{{\textbf{d}}}}\), here we show how \(\eta ^{{{\textbf{d}}}}\) depends on the penalty cost P, and particularly, \(\eta ^{{{\textbf{d}}}}\) is linear in P. To do this, we write that for \(i=0,\)

$$\begin{aligned} A_{0}=0,\text { }B_{0}=-C_{2,1}\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda ; \end{aligned}$$

for \(i=1,2,\ldots ,K,\)

$$\begin{aligned} A_{i}^{\left( {{\textbf{d}}}\right) }=\mu _{2}d_{i},\text { }B_{i}^{\left( {{\textbf{d}}}\right) }=R\left( \mu _{1}+\mu _{2}d_{i}\right) -C_{1}i-C_{2,2} \mu _{2}\left( 1-d_{i}\right) -C_{3}\lambda ; \end{aligned}$$

for \(i=K+1,K+2,\ldots ,N-1,\)

$$\begin{aligned} A_{i}=0,\ B_{i}=R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda ; \end{aligned}$$

for \(i=N,\)

$$\begin{aligned} A_{i}=0,\ B_{N}=R\left( \mu _{1}+\mu _{2}\right) -C_{1}N-C_{4}\lambda . \end{aligned}$$

Then it follows from (8) to (11) that for \(i=0\),

$$\begin{aligned} f\left( 0\right) =B_{0}; \end{aligned}$$
(14)

for \(i=1,2,\ldots ,K,\)

$$\begin{aligned} f^{\left( {{\textbf{d}}}\right) }\left( i\right) =B_{i}^{\left( {{\textbf{d}}}\right) }-PA_{i}^{\left( {{\textbf{d}}}\right) }; \end{aligned}$$
(15)

for \(i=K+1,K+2,\ldots ,N,\)

$$\begin{aligned} f\left( i\right) =B_{i}. \end{aligned}$$
(16)

It follows from (6) and (14) to (16) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}}&={\pi }^{({{\textbf{d}}})}{{\textbf{f}}}^{({{\textbf{d}}} )}\nonumber \\&=\pi ^{\left( {{\textbf{d}}}\right) }\left( 0\right) f\left( 0\right) +\sum _{i=1}^{K}\pi ^{\left( {{\textbf{d}}}\right) }\left( i\right) f^{\left( {{\textbf{d}}}\right) }\left( i\right) +\sum _{i=K+1}^{N}\pi ^{\left( {{\textbf{d}}}\right) }\left( i\right) f\left( i\right) \nonumber \\&={D}^{\left( {{\textbf{d}}}\right) }-P{F}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$
(17)

where

$$\begin{aligned} {D}^{\left( {{\textbf{d}}}\right) }=\pi ^{\left( {{\textbf{d}}}\right) }\left( 0\right) B_{0}+\sum _{i=1}^{K}\pi ^{\left( {{\textbf{d}}}\right) }\left( i\right) B_{i}^{\left( {{\textbf{d}}}\right) }+\sum _{i=K+1}^{N}\pi ^{\left( {{\textbf{d}}}\right) }\left( i\right) B_{i} \end{aligned}$$

and

$$\begin{aligned} {F}^{\left( {{\textbf{d}}}\right) }=\sum _{i=1}^{K}\pi ^{\left( {{\textbf{d}}} \right) }\left( i\right) A_{i}^{\left( {{\textbf{d}}}\right) }. \end{aligned}$$

Hence the long-run average profit \(\eta ^{{{\textbf{d}}}}\) is linear in the penalty cost P.

We observe that when the inventory level is low, supplying the products to the demands of Class 2 leads to that both the total system revenue and the total system cost increase synchronously. Thus there is a tradeoff between the total system revenue and the total system cost. This motivates us to find an optimal dynamic rationing policy such that the warehouse has the maximal profit. Therefore, our objective is to find an optimal dynamic rationing policy \({{\textbf{d}}}^{*}\) such that the long-run average profit \(\eta ^{{{\textbf{d}}}}\) is maximal for \({{\textbf{d}}}={{\textbf{d}}}^{*}\), that is,

$$\begin{aligned} {{\textbf{d}}}^{*}=\mathop {\arg \max } _{{{\textbf{d}}}\in {\mathcal {D}}}\left\{ \eta ^{{{\textbf{d}}}}\right\} . \end{aligned}$$
(18)

In fact, it is difficult and challenging not only to analyze some interesting structural properties of the optimal rationing policies \({{\textbf{d}}}^{*}\), but also to provide some effective algorithms for computing the optimal dynamic rationing policy \({{\textbf{d}}}^{*}\).

In the remainder of this paper, we apply the sensitivity-based optimization to study the optimal policy problem (18), where the Poisson equations will play a key role in the study of MDPs and sensitivity-based optimization.

5 A policy-based poisson equation

In this section, for the stock-rationing queue, we set up a policy-based Poisson equation which is derived by means of the law of total probability and analysis on some stop** times of the policy-based birth-death process \(\left\{ I^{\left( {{\textbf{d}}}\right) }\left( t\right) ,t\ge 0\right\} \). It is worth noting that the policy-based Poisson equation provides a useful relation between the sensitivity-based optimization and the MDPs, see, e.g. Puterman (2014) and Cao (2007).

For any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), it follows from Chapter 2 in Cao (2007) that for the continuous-time policy-based birth-death process \(\left\{ I^{\left( {{\textbf{d}}}\right) }\left( t\right) ,t\ge 0\right\} ,\) we define the performance potential as

$$\begin{aligned} g^{\left( {{\textbf{d}}}\right) }\left( i\right) =E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i\right\} , \end{aligned}$$
(19)

where \(\eta ^{{{\textbf{d}}}}\) is defined in (13). It is seen from Cao (2007) that for Policy \({{\textbf{d}}}\in {\mathcal {D}}\), \(g^{\left( {{\textbf{d}}}\right) }\left( i\right) \) quantifies the contribution of the initial State i to the long-run average profit of the stock-rationing queue. Here, \(g^{\left( {{\textbf{d}}}\right) }\left( i\right) \) is also called the relative value function or the bias in the traditional MDP theory, see, e.g. Puterman (2014). We further define a column vector \({{\textbf{g}}} ^{\left( {{\textbf{d}}}\right) }\) as

$$\begin{aligned} {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }=\left( g^{\left( {{\textbf{d}}}\right) }\left( 0\right) ;g^{\left( {{\textbf{d}}}\right) }\left( 1\right) ,\ldots ,g^{\left( {{\textbf{d}}}\right) }\left( K\right) ;g^{\left( {{\textbf{d}}}\right) }\left( K+1\right) ,\ldots ,g^{\left( {{\textbf{d}}}\right) }\left( N\right) \right) ^{T}. \end{aligned}$$
(20)

To compute the vector \({{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }\), we define the first departure time of the policy-based birth-death process \(\left\{ I^{({{\textbf{d}}})}(t):t\ge 0\right\} \) beginning from State i as

$$\begin{aligned} \tau =\inf \left\{ t\ge 0:I^{\left( {{\textbf{d}}}\right) }\left( t\right) \ne i\right\} , \end{aligned}$$

where \(I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =i\). Clearly, \(\tau \) is a stop** time of the policy-based birth-death process \(\left\{ I^{({{\textbf{d}}})}(t):t\ge 0\right\} \). Based on this, if \(i=0\), then it is seen from (3) that State 0 is a boundary state of the policy-based birth-death process \({{\textbf{B}}}^{({{\textbf{d}}})}\), hence \(I^{\left( {{\textbf{d}}}\right) }\left( \tau \right) =1\). Similarly, for each State \(i\in \varvec{\Omega }\), a basic relation is established as follows:

$$\begin{aligned} I^{\left( {{\textbf{d}}}\right) }\left( \tau \right) =\left\{ \begin{array}{ll} 1, &{} i=0,\\ i-1\ \text {or }i+1, &{} i=1,2,\ldots ,N-1,\\ N-1, &{} i=N. \end{array} \right. \end{aligned}$$
(21)

To compute the column vector \({{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }\), we derive a policy-based Poisson equation in terms of both the stop** time \(\tau \) and the basic relation (17). Based on this, we set up the Poisson equation according to four parts as follows:

Part (a): For \(i=0\), we have

$$\begin{aligned} g^{\left( {{\textbf{d}}}\right) }\left( 0\right)&=E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =0\right\} \\&=E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =0\right. \right\} \left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] \\&\qquad +E\left\{ \left. \int _{\tau }^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( \tau \right) \right\} \ \\&=\frac{1}{\lambda }\left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] +E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =1\right\} \\&=\frac{1}{\lambda }\left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] +g^{\left( {{\textbf{d}}}\right) }\left( 1\right) , \end{aligned}$$

where for the policy-based birth-death process \(\left\{ I^{({{\textbf{d}}} )}(t):t\ge 0\right\} \), it is easy to see from Fig. 2 that by using \(I^{\left( {{\textbf{d}}}\right) }\left( t\right) =0\) for \(0\le t<\tau ,\)

$$\begin{aligned} \int _{0}^{\tau }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt&=\tau \left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] ,\\ E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =0\right. \right\}&=\frac{1}{\lambda }. \end{aligned}$$

We obtain

$$\begin{aligned} -\lambda g^{\left( {{\textbf{d}}}\right) }\left( 0\right) +\lambda g^{\left( {{\textbf{d}}}\right) }\left( 1\right) =\eta ^{{{\textbf{d}}}}-f\left( 0\right) . \end{aligned}$$
(22)

Part (b): For \(i=1,2,\ldots ,K\), it is easy to see from Fig. 2 that

$$\begin{aligned} g^{\left( {{\textbf{d}}}\right) }\left( i\right)&=E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i\right\} \\&=E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =i\right. \right\} \left[ f^{({{\textbf{d}}})}\left( i\right) -\eta ^{{{\textbf{d}}}}\right] \\&\qquad +E\left\{ \left. \int _{\tau }^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( \tau \right) \right\} \\&=\frac{1}{v\left( d_{i}\right) +\lambda }\left[ f^{({{\textbf{d}}})}\left( i\right) -\eta ^{{{\textbf{d}}}}\right] \\&\text { }\quad \quad +\frac{\lambda }{v\left( d_{i}\right) +\lambda }E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}} )}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i+1\right\} \\ \text { }&\text { }\quad \quad +\frac{v\left( d_{i}\right) }{v\left( d_{i}\right) +\lambda }E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i-1\right\} \\&=\frac{1}{v\left( d_{i}\right) +\lambda }\left[ f^{({{\textbf{d}}})}\left( i\right) -\eta ^{{{\textbf{d}}}}\right] +\frac{\lambda }{v\left( d_{i}\right) +\lambda }g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) \\&\qquad +\frac{v\left( d_{i}\right) }{v\left( d_{i}\right) +\lambda }g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) , \end{aligned}$$

where

$$\begin{aligned} E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =i\right. \right\} =\frac{1}{v\left( d_{i}\right) +\lambda }. \end{aligned}$$

We obtain

$$\begin{aligned} v\left( d_{i}\right) g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -\left[ v\left( d_{i}\right) +\lambda \right] g^{\left( {{\textbf{d}}}\right) }\left( i\right) +\lambda g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =\eta ^{{{\textbf{d}}}}-f^{({{\textbf{d}}})}\left( i\right) . \end{aligned}$$
(23)

Part (c): For \(i=K+1,K+2,\ldots ,N-1\), by using Fig. 2 we have

$$\begin{aligned} g^{\left( {{\textbf{d}}}\right) }\left( i\right)&=E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i\right\} \\&=E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =i\right. \right\} \left[ f\left( i\right) -\eta ^{{{\textbf{d}}}}\right] \\&\qquad +E\left\{ \left. \int _{\tau }^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( \tau \right) \right\} \\&=\frac{1}{\mu _{1}+\mu _{2}+\lambda }\left[ f\left( i\right) -\eta ^{{{\textbf{d}}}}\right] \\ \text { }&\text { }\quad \quad +\frac{\lambda }{\mu _{1}+\mu _{2}+\lambda }E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i+1\right\} \\&\text { }\quad \quad +\frac{\mu _{1}+\mu _{2}}{\mu _{1}+\mu _{2}+\lambda }E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}} )}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =i-1\right\} \\&=\frac{1}{\mu _{1}+\mu _{2}+\lambda }\left[ f\left( i\right) -\eta ^{{{\textbf{d}}}}\right] +\frac{\lambda }{\mu _{1}+\mu _{2}+\lambda }g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) \\&\qquad +\frac{\mu _{1}+\mu _{2}}{\mu _{1} +\mu _{2}+\lambda }g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) , \end{aligned}$$

where

$$\begin{aligned} E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( t\right) =i\right. \right\} =\frac{1}{\mu _{1}+\mu _{2}+\lambda }. \end{aligned}$$

We obtain

$$\begin{aligned} \left( \mu _{1}+\mu _{2}\right) g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -\left( \mu _{1}+\mu _{2}+\lambda \right) g^{\left( {{\textbf{d}}} \right) }\left( i\right) +\lambda g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =\eta ^{{{\textbf{d}}}}-f\left( i\right) . \end{aligned}$$
(24)

Part (d): For \(i=N\), by using Fig. 2 we have

$$\begin{aligned}&g^{\left( {{\textbf{d}}}\right) }\left( N\right) =E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =N\right\} \\&\quad =E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( 0\right) =N\right. \right\} \left[ f\left( N\right) -\eta ^{{{\textbf{d}}}}\right] \\&\qquad +E\left\{ \left. \int _{\tau }^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( \tau \right) \right\} \\&\quad =\frac{1}{\mu _{1}+\mu _{2}}\left[ f\left( N\right) -\eta ^{{{\textbf{d}}} }\right] \\&\qquad +E\left\{ \left. \int _{0}^{+\infty }\left[ f^{({{\textbf{d}}})}\left( I^{({{\textbf{d}}})}(t)\right) -\eta ^{{{\textbf{d}}}}\right] dt\right| I^{({{\textbf{d}}})}\left( 0\right) =N-1\right\} \\&\quad =\frac{1}{\mu _{1}+\mu _{2}}\left[ f\left( N\right) -\eta ^{{{\textbf{d}}} }\right] +g^{\left( {{\textbf{d}}}\right) }\left( N-1\right) , \end{aligned}$$

where

$$\begin{aligned} E\left\{ \tau \left| I^{\left( {{\textbf{d}}}\right) }\left( t\right) =N\right. \right\} =\frac{1}{\mu _{1}+\mu _{2}}. \end{aligned}$$

We obtain

$$\begin{aligned} \left( \mu _{1}+\mu _{2}\right) g^{\left( {{\textbf{d}}}\right) }\left( N-1\right) -\left( \mu _{1}+\mu _{2}\right) g^{\left( {{\textbf{d}}}\right) }\left( N\right) =\eta ^{{{\textbf{d}}}}-f\left( N\right) . \end{aligned}$$
(25)

Thus it follows from (22), (23), (24) and (25) that

$$\begin{aligned} -{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }{{\textbf{g}}}^{\left( {{\textbf{d}}} \right) }={{\textbf{f}}}^{({{\textbf{d}}})}-\eta ^{{{\textbf{d}}}}{{\textbf{e}}}, \end{aligned}$$
(26)

where \({{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\), \({{\textbf{f}}}^{({{\textbf{d}}})}\) and \(\eta ^{{{\textbf{d}}}}\) are given in (3), (12) and (13), respectively.

In what follows we provide an effective method to solve the policy-based Poisson equation, and show that there exist infinitely-many solutions with two free constants of additive terms. This leads to a general solution with the two free constants of the policy-based Poisson equation.

To solve the system of linear equations (26), it is easy to see that rank\(\left( {{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\right) =N\) and \(\det \left( {{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\right) =0\) due to the fact that the size of the matrix \({{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\) is \(N+1\). Hence, this system of linear equations (26) exists infinitely-many solutions with a free constant of an additive term.

Let \({\mathcal {B}}\) be a matrix obtained through omitting the first row and the first column vectors of the matrix \({{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\). Then,

$$\begin{aligned} {\mathcal {B}}=\left( \begin{array}{cccccc} -\left[ \lambda +\nu \left( d_{1}\right) \right] &{} \lambda &{} &{} &{} &{} \\ \nu \left( d_{2}\right) &{} -\left[ \lambda +\nu \left( d_{2}\right) \right] &{} \lambda &{} &{} &{} \\ &{} \ddots \text { } &{} \ddots \text { } &{} \ddots \text { } &{} &{} \\ &{} \nu \left( d_{K}\right) &{} -\left[ \lambda +\nu \left( d_{K}\right) \right] &{} \lambda &{} &{} \\ &{} &{} \nu \left( 1\right) &{} -\left[ \lambda +\nu \left( 1\right) \right] &{} \lambda &{} \\ &{} &{} \text { }\ddots &{} &{} \ddots \text { }\ddots &{} \text { }\\ &{} &{} &{} \nu \left( 1\right) &{} -\left[ \lambda +\nu \left( 1\right) \right] &{} \lambda \\ &{} &{} &{} &{} \nu \left( 1\right) &{} -\nu \left( 1\right) \end{array} \right) . \end{aligned}$$

Obviously, rank\(\left( {\mathcal {B}}\right) =N.\) Since the size of the matrix \({\mathcal {B}}\) is N, the matrix \({\mathcal {B}}\) is invertible, and \(\left( \mathcal {-B}\right) ^{-1}>0\).

Let \({{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }\) and \({\varphi }^{({{\textbf{d}}})}\) be two column vectors of size N obtained through omitting the first elements of the two column vectors \({{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }-\eta ^{{{\textbf{d}}}}{{\textbf{e}}}\) and \({{\textbf{g}}} ^{({{\textbf{d}}})}\) of size \(N+1\), respectively. Then,

$$\begin{aligned} {{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }=\left( \begin{array}{c} H_{1}^{\left( {{\textbf{d}}}\right) }\\ H_{2}^{\left( {{\textbf{d}}}\right) }\\ \vdots \\ H_{K}^{\left( {{\textbf{d}}}\right) }\\ H_{K+1}^{\left( {{\textbf{d}}}\right) }\\ \vdots \\ H_{N}^{\left( {{\textbf{d}}}\right) } \end{array} \right) =\left( \begin{array}{c} f^{\left( {{\textbf{d}}}\right) }\left( 1\right) -\eta ^{{{\textbf{d}}}}\\ f^{\left( {{\textbf{d}}}\right) }\left( 2\right) -\eta ^{{{\textbf{d}}}}\\ \vdots \\ f^{\left( {{\textbf{d}}}\right) }\left( K\right) -\eta ^{{{\textbf{d}}}}\\ f\left( K+1\right) -\eta ^{{{\textbf{d}}}}\\ \vdots \\ f\left( N\right) -\eta ^{{{\textbf{d}}}} \end{array} \right) =\left( \begin{array}{c} \left[ B_{1}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] -P\left[ A_{1}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \\ \left[ B_{2}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] -P\left[ A_{2}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \\ \vdots \\ \left[ B_{K}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] -P\left[ A_{K}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \\ \left[ B_{K+1}-{D}^{\left( {{\textbf{d}}}\right) }\right] -P\left[ A_{K+1}-{F}^{\left( {{\textbf{d}}}\right) }\right] \\ \vdots \\ \left[ B_{N}-{D}^{\left( {{\textbf{d}}}\right) }\right] -P\left[ A_{N} -{F}^{\left( {{\textbf{d}}}\right) }\right] \end{array} \right) \end{aligned}$$

and

$$\begin{aligned} {\varphi }^{({{\textbf{d}}})}=\left( g^{\left( {{\textbf{d}}}\right) }\left( 1\right) ,g^{\left( {{\textbf{d}}}\right) }\left( 2\right) ,\ldots ,g^{\left( {{\textbf{d}}}\right) }\left( K\right) ;g^{\left( {{\textbf{d}}}\right) }\left( K+1\right) ,g^{\left( {{\textbf{d}}}\right) }\left( K+2\right) ,\ldots ,g^{\left( {{\textbf{d}}}\right) }\left( N\right) \right) ^{T}. \end{aligned}$$

Therefore, it follows from (26) that

$$\begin{aligned} -{\mathcal {B}}{\varphi }^{({{\textbf{d}}})}={{\textbf{H}}}^{\left( {{\textbf{d}}} \right) }+\nu \left( d_{1}\right) {{\textbf{e}}}_{1} g^{\left( {{\textbf{d}}}\right) }\left( 0 \right) , \end{aligned}$$
(27)

where \({{\textbf{e}}}_{1}\) is a column vector with the first element be one and all the others be zero. Note that the matrix \(-{\mathcal {B}}\) is invertible and \(\left( -{\mathcal {B}}\right) ^{-1}>0\), thus the system of linear equations (27) always has one unique solution

$$\begin{aligned} {\varphi }^{({{\textbf{d}}})}=\left( -{\mathcal {B}}\right) ^{-1} {{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }+\nu \left( d_{1}\right) \left( -{\mathcal {B}}\right) ^{-1}{{\textbf{e}}}_{1}\cdot \Im , \end{aligned}$$
(28)

where \(g^{\left( {{\textbf{d}}}\right) }\left( 0\right) =\Im \) is any given constant. Let’s take a convention

$$\begin{aligned} \left( \begin{array}{c} a\\ {{\textbf{b}}} \end{array} \right) =\left( a,{{\textbf{b}}}\right) ^{T}, \end{aligned}$$

where \({{\textbf{b}}}\) may be a column vector. Then we have

$$\begin{aligned} {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }&=\left( g^{\left( {{\textbf{d}}}\right) }\left( 0\right) ,{\varphi }^{({{\textbf{d}}})}\right) ^{T}\nonumber \\&=\left( \Im ,\left( -{\mathcal {B}}\right) ^{-1}{{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }+\nu \left( d_{1}\right) \left( -{\mathcal {B}}\right) ^{-1}{{\textbf{e}}}_{1}\cdot \Im \right) ^{T}\nonumber \\&=\left( 0,\left( -{\mathcal {B}}\right) ^{-1}{{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }\right) ^{T}+\left( 1,\nu \left( d_{1}\right) \left( -{\mathcal {B}}\right) ^{-1}{{\textbf{e}}}_{1}\right) ^{T}\Im . \end{aligned}$$
(29)

Note that \({{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }{{\textbf{e}}}=0,\) thus a general solution to the policy-based Poisson equation is further given by

$$\begin{aligned} {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }=\left( 0,\left( -{\mathcal {B}} \right) ^{-1}{{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }\right) ^{T}+\left( 1,\nu \left( d_{1}\right) \left( -{\mathcal {B}}\right) ^{-1}{{\textbf{e}}} _{1}\right) ^{T}\Im +\xi {{\textbf{e}}}, \end{aligned}$$
(30)

where \(\Im \) and \(\xi \) are two free constants.

Based on the above analysis, the following theorem summarizes the general solution of the policy-based Poisson equation.

Theorem 1

For the Poisson equation \(-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }{{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }={{\textbf{f}}} ^{({{\textbf{d}}})}-\eta ^{{{\textbf{d}}}}{{\textbf{e}}}\), there exists a key special solution \({{\textbf{g}}}_{\text {Sp}}^{{{\textbf{d}}}}=\left( 0,\left( -{\mathcal {B}} \right) ^{-1}{{\textbf{H}}}^{\left( {{\textbf{d}}}\right) }\right) ^{T}\), and its general solution is related to two free constants \(\Im \) and \(\xi \) such that

$$\begin{aligned} {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }={{\textbf{g}}}_{\text {Sp}}^{{{\textbf{d}}} }+\left( 1,\nu \left( d_{1}\right) \left( -{\mathcal {B}}\right) ^{-1}{{\textbf{e}}}_{1}\right) ^{T}\Im +\xi {{\textbf{e}}}, \end{aligned}$$

where \(\xi \) is a potential displacement constant, and \(\Im \) is a solution-free constant.

Remark 3

  1. (1)

    To our best knowledge, this is the first to provide the general solution of the Poisson equations in the MDPs by means of two different free constants.

  2. (2)

    Note that \({\pi }^{({{\textbf{d}}})}{{\textbf{g}}}^{\left( {{\textbf{d}}} \right) }=\eta ^{{{\textbf{d}}}}\) and the matrix \(-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }+\textbf{e}{\pi }^{({{\textbf{d}}})}\) is invertible, thus the Poisson equation \(-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }{{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }={{\textbf{f}}}^{({{\textbf{d}}})}-\eta ^{{{\textbf{d}}}}{{\textbf{e}}}\) can become

    $$\begin{aligned} \left( -{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }+\textbf{e}{\pi }^{({{\textbf{d}}} )}\right) {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }={{\textbf{f}}}^{({{\textbf{d}}} )}. \end{aligned}$$

    This gives a solution of the Poisson equation as follows:

    $$\begin{aligned} {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }=\left( -{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }+\textbf{e}{\pi }^{({{\textbf{d}}})}\right) ^{-1}{{\textbf{f}}} ^{({{\textbf{d}}})}+\xi {{\textbf{e}}}, \end{aligned}$$

    which is a special solution of the Poisson equation by comparing with that in Theorem 1.

  3. (3)

    To further understand the solution of the Poisson equation, readers may refer to, for example, Cao (2007) and Ma et al. (2019) for more details.

6 Impact of the penalty cost

In this section, we provide an explicit expression for the perturbation realization factor of the policy-based birth-death process. Based on this, we can set up a linear equation with respect to the penalty cost, which is well related to the performance difference equation. Furthermore, we discuss some useful properties of policies in the set \({\mathcal {D}}\) by means of the solution of the linear equation in the penalty cost.

6.1 The perturbation realization factor

We define a perturbation realization factor as

$$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) \overset{\text {def}}{=}g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -g^{\left( {{\textbf{d}}}\right) }\left( i\right) ,i=1,2,\ldots ,N. \end{aligned}$$
(31)

It is easy to see from Cao (2007) that \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) quantifies the difference among two adjacent performance potentials \(g^{\left( {{\textbf{d}}}\right) }\left( i\right) \) and \(g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) \), and measures the effect on the long-run average profit of the stock-rationing queue when the system state is changed from \(i-1\) to i. By using the policy-based Poisson equation (26), we can derive a new system of linear equations, which can be used to directly express the perturbation realization factor \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) for \(i=1,2,\ldots ,N\).

By using (30), we can directly express the perturbation realization factor \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) for \(i=1,2,\ldots ,N\). On the other hand, by observing the special structure of the policy-based Poisson equation (26), we can propose a new method of sequence to set up an explicit expression for \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \).

For \(i=1\), it follows from (22) that

$$\begin{aligned} -\lambda \left[ g^{\left( {{\textbf{d}}}\right) }\left( 0\right) -g^{\left( {{\textbf{d}}}\right) }\left( 1\right) \right] =-\lambda G^{\left( {{\textbf{d}}}\right) }\left( 1\right) , \end{aligned}$$

we have

$$\begin{aligned} \lambda G^{\left( {{\textbf{d}}}\right) }\left( 1\right) =f\left( 0\right) -\eta ^{{{\textbf{d}}}}. \end{aligned}$$
(32)

For \(i=2,3,\ldots ,K\), it follows from (23) that

$$\begin{aligned}&\text { }v\left( d_{i}\right) \left[ g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -g^{\left( {{\textbf{d}}}\right) }\left( i\right) \right] -\lambda \left[ g^{\left( {{\textbf{d}}}\right) }\left( i\right) -g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) \right] \\&=v\left( d_{i}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) -\lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) , \end{aligned}$$

this gives

$$\begin{aligned} \lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =v\left( d_{i}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) +f^{({{\textbf{d}}} )}\left( i\right) -\eta ^{{{\textbf{d}}}}. \end{aligned}$$
(33)

For \(i=K+1,K+2,\ldots ,N-1\), it follows from (24) that

$$\begin{aligned}&\text { }\left( \mu _{1}+\mu _{2}\right) \left[ g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -g^{\left( {{\textbf{d}}}\right) }\left( i\right) \right] -\lambda \left[ g^{\left( {{\textbf{d}}}\right) }\left( i\right) -g^{\left( {{\textbf{d}}}\right) }\left( i+1\right) \right] \\&=\left( \mu _{1}+\mu _{2}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) -\lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) , \end{aligned}$$

we obtain

$$\begin{aligned} \lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =\left( \mu _{1}+\mu _{2}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) +f\left( i\right) -\eta ^{{{\textbf{d}}}}. \end{aligned}$$
(34)

For \(i=N\), it follows from (25) that

$$\begin{aligned} \left( \mu _{1}+\mu _{2}\right) G^{\left( {{\textbf{d}}}\right) }\left( N\right) =\eta ^{{{\textbf{d}}}}-f\left( N\right) . \end{aligned}$$
(35)

By using (32), (33), (34) and (35), we obtain a new system of linear equations satisfied by \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} \lambda G^{\left( {{\textbf{d}}}\right) }\left( 1\right) =f\left( 0\right) -\eta ^{{{\textbf{d}}}}, &{} i=1,\\ \lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =v\left( d_{i}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) +f^{({{\textbf{d}}} )}\left( i\right) -\eta ^{{{\textbf{d}}}}, &{} i=2,3,\ldots ,K,\\ \lambda G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =\left( \mu _{1}+\mu _{2}\right) G^{\left( {{\textbf{d}}}\right) }\left( i\right) +f\left( i\right) -\eta ^{{{\textbf{d}}}}, &{} i=K+1,K+2,\ldots ,N-1,\\ \left( \mu _{1}+\mu _{2}\right) G^{\left( {{\textbf{d}}}\right) }\left( N\right) =\eta ^{{{\textbf{d}}}}-f\left( N\right) , &{} i=N. \end{array} \right. \end{aligned}$$
(36)

Fortunately, the following theorem can provide an explicit expression for the perturbation realization factor \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) for \(1\le i\le N\).

Theorem 2

For any given policy \({{\textbf{d}}}\), the perturbation realization factor \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) \) is given by

  1. (a)

    for \(1\le i\le K\),

    $$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) =\lambda ^{-i}\left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1}\lambda ^{r-i}\left[ f^{\left( {{\textbf{d}}}\right) }\left( r\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) ;\text { } \end{aligned}$$
    (37)
  2. (b)

    for \(K+1\le i\le N\),

    $$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right)&=\lambda ^{-i}\left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=1}^{K}v\left( d_{k}\right) \left[ v\left( 1\right) \right] ^{i-K-1}\\&\text { }\quad \quad +\sum \limits _{r=1}^{K-1}\lambda ^{r-K}\left[ f^{\left( {{\textbf{d}}}\right) }\left( r\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=r+1}^{K}v\left( d_{k}\right) \\&\qquad +\sum \limits _{r=K}^{i-1} \lambda ^{r-i}\left[ f\left( r\right) -\eta ^{{{\textbf{d}}}}\right] \left[ v\left( 1\right) \right] ^{i-r-2}. \end{aligned}$$

Proof

We only prove (a), since the proof of (b) is similar.

It follows from (36) that

$$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( 1\right) =\frac{f\left( 0\right) -\eta ^{{{\textbf{d}}}}}{\lambda }. \end{aligned}$$

Similarly, we obtain

$$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i+1\right) =\frac{v\left( d_{i}\right) }{\lambda }G^{\left( {{\textbf{d}}}\right) }\left( i\right) +\frac{f^{\left( {{\textbf{d}}}\right) }\left( i\right) -\eta ^{{{\textbf{d}}}} }{\lambda },\text { }i=1,2,\ldots ,K. \end{aligned}$$

By using (1.2.4) in Chapter 1 of Elaydi (1996), we can obtain the explicit expression of the perturbation realization factor as follows:

$$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) =\lambda ^{-i}\left[ f\left( 0\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1}\lambda ^{r-i}\left[ f^{\left( {{\textbf{d}}}\right) }\left( r\right) -\eta ^{{{\textbf{d}}}}\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) \end{aligned}$$

for \(i=1,2,\ldots ,K\). This completes the proof. \(\square \)

6.2 The performance difference equation

For any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), the long-run average profit of the stock-rationing queue is given by

$$\begin{aligned} \eta ^{{{\textbf{d}}}}={\pi }^{\left( {{\textbf{d}}}\right) }{{\textbf{f}}} ^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

and the policy-based Poisson equation is given by

$$\begin{aligned} {{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }{{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }=\eta ^{{{\textbf{d}}}}{{\textbf{e}}}-{{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }. \end{aligned}$$

It is seen from (3) and (12) that Policy \({{\textbf{d}}}\) directly affects not only the elements of the infinitesimal generator \({{\textbf{B}}} ^{\left( {{\textbf{d}}}\right) }\) but also the reward function \({{\textbf{f}}} ^{\left( {{\textbf{d}}}\right) }\). Based on this, if Policy \({{\textbf{d}}}\) changes to \({{\textbf{d}}}^{\prime }\), then the infinitesimal generator \({{\textbf{B}}} ^{\left( {{\textbf{d}}}\right) }\) and the reward function \({{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }\) can have their corresponding changes \({{\textbf{B}}} ^{\left( {{\textbf{d}}}^{\prime }\right) }\) and \({{\textbf{f}}}^{\left( {{\textbf{d}}}^{\prime }\right) }\), respectively.

The following lemma provides a useful equation (called performance difference equation) for the difference \(\eta ^{{{\textbf{d}}}^{\prime }}-\eta ^{{{\textbf{d}}}}\) corresponding to any two different policies \({{\textbf{d}}},{{\textbf{d}}}^{\prime } \in {\mathcal {D}}\). Here, we only restate the performance difference equation without proof, readers may refer to Cao (2007) or Ma et al. (2019) for more details.

Lemma 1

For any two policies \({{\textbf{d}}},{{\textbf{d}}}^{\prime } \in {\mathcal {D}}\), we have

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{\prime }}-\eta ^{{{\textbf{d}}}}={\pi }^{\left( {{\textbf{d}}}^{\prime }\right) }\left[ \left( {{\textbf{B}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\right) {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }+\left( {{\textbf{f}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }\right) \right] . \end{aligned}$$
(38)

By using the performance difference Eq. (38), we can set up a partial order relation for the policies in the policy set \({\mathcal {D}}\) as follows. For any two policies \({{\textbf{d}}},{{\textbf{d}}}^{\prime }\in {\mathcal {D}}\), we write that \({{\textbf{d}}}^{\prime }\succ {{\textbf{d}}}\) if \(\eta ^{{{\textbf{d}}} ^{\prime }}>\eta ^{{{\textbf{d}}}}\); \({{\textbf{d}}}^{\prime }\thickapprox {{\textbf{d}}}\) if \(\eta ^{{{\textbf{d}}}^{\prime }}=\eta ^{{{\textbf{d}}}}\); and\(\ {{\textbf{d}}}^{\prime } \prec {{\textbf{d}}}\) if \(\eta ^{{{\textbf{d}}}^{\prime }}<\eta ^{{{\textbf{d}}}}\). Also, we write that\(\ {{\textbf{d}}}^{\prime }\succeq {{\textbf{d}}}\) if \(\eta ^{{{\textbf{d}}} ^{\prime }}\ge \eta ^{{{\textbf{d}}}}\); and \({{\textbf{d}}}^{\prime }\preceq {{\textbf{d}}}\) if \(\eta ^{{{\textbf{d}}}^{\prime }}\le \eta ^{{{\textbf{d}}}}\).

Under this partial order relation, our research target is to find the optimal policy \({{\textbf{d}}}^{*}\in {\mathcal {D}}\) such that \({{\textbf{d}}}^{*}\) \(\succeq {{\textbf{d}}}\) for any policy \({{\textbf{d}}}\in {\mathcal {D}}\), i.e.,

$$\begin{aligned} {{\textbf{d}}}^{*}=\underset{{{\textbf{d}}}\in {\mathcal {D}}}{\arg \max }\left\{ \eta ^{{{\textbf{d}}}}\right\} . \end{aligned}$$

Note that the policy set \({\mathcal {D}}\) and the state set \(\varvec{\Omega }\) are all finite, thus an enumeration method using finite comparisons is feasible for finding the optimal policy \({{\textbf{d}}}^{*}\) in the policy set \({\mathcal {D}}\).

To find the optimal policy \({{\textbf{d}}}^{*}\), we define two policies \({{\textbf{d}}}\) and \({{\textbf{d}}}^{\prime }\) with an interrelated structure at Position i as follows:

$$\begin{aligned} {{\textbf{d}}}&=\left( 0;d_{1},d_{2},\ldots ,d_{i-1},\underline{d_{i}} ,d_{i+1},\ldots ,d_{K};1,1,\ldots ,1\right) ,\\ {{\textbf{d}}}^{\prime }&=\left( 0;d_{1},d_{2},\ldots ,d_{i-1},\underline{d_{i}^{\prime }},d_{i+1},\ldots ,d_{K};1,1,\ldots ,1\right) , \end{aligned}$$

where \(d_{i}^{\prime },d_{i}\in \left\{ 0,1\right\} \) with \(d_{i}^{\prime }\ne d_{i}\). Clearly, if the two policies \({{\textbf{d}}}\) and \({{\textbf{d}}}^{\prime }\) have an interrelated structure at Position i, then only the difference between the two policies \({{\textbf{d}}}\) and \({{\textbf{d}}}^{\prime }\) is at their ith elements: \(d_{i}\) and \(d_{i}^{\prime }\).

Lemma 2

For the two policies \({{\textbf{d}}}\) and \({{\textbf{d}}}^{\prime }\) with an interrelated structure at Position i: \(d_{i}\) and \(d_{i}^{\prime }\), we have

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{\prime }}-\eta ^{{{\textbf{d}}}}=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{\prime }\right) }\left( i\right) \left( d_{i}^{\prime } -d_{i}\right) \left[ {G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\right] , \end{aligned}$$
(39)

where \(b=R+C_{2,2}-P\).

Proof

For the two policies \({{\textbf{d}}}\) and \({{\textbf{d}}}^{\prime }\) with an interrelated structure at Position i: \(d_{i}\) and \(d_{i}^{\prime }\), we have

$$\begin{aligned} {{\textbf{d}}}&=\left( 0;d_{1},d_{2},\ldots ,d_{i-1},\underline{d_{i}} ,d_{i+1},\ldots ,d_{K};1,1,\ldots ,1\right) ,\\ {{\textbf{d}}}^{\prime }&=\left( 0;d_{1},d_{2},\ldots ,d_{i-1},\underline{d_{i}^{\prime }},d_{i+1},\ldots ,d_{K};1,1,\ldots ,1\right) . \end{aligned}$$

It is easy to check from (3) that

$$\begin{aligned} {{\textbf{B}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\mathbf {=}\left( \begin{array}{ccccccc} 0 &{} &{} &{} &{} &{} &{} \\ 0 &{} \ddots &{} &{} &{} &{} &{} \\ &{} \ddots &{} 0 &{} &{} &{} &{} \\ &{} &{} \left( d_{i}^{\prime }-d_{i}\right) \mu _{2} &{} -\left( d_{i}^{\prime }-d_{i}\right) \mu _{2} &{} &{} &{} \\ &{} &{} &{} 0 &{} 0 &{} &{} \\ &{} &{} &{} &{} \ddots &{} \ddots &{} \\ &{} &{} &{} &{} &{} 0 &{} 0 \end{array} \right) . \end{aligned}$$
(40)

Also, from the reward function (9), we obtain

$$\begin{aligned} f^{\left( {{\textbf{d}}}\right) }\left( i\right) =\left( R+C_{2,2}-P\right) \mu _{2}d_{i}+R\mu _{1}-C_{1}i-C_{2,2}\mu _{2}-C_{3}\lambda \end{aligned}$$

and

$$\begin{aligned} f^{\left( {{\textbf{d}}}^{\prime }\right) }\left( i\right) =\left( R+C_{2,2}-P\right) \mu _{2}d_{i}^{\prime }+R\mu _{1}-C_{1}i-C_{2,2}\mu _{2} -C_{3}\lambda . \end{aligned}$$

This gives

$$\begin{aligned} {{\textbf{f}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }=\left( 0,0,\ldots ,0,b\mu _{2}\left( d_{i}^{\prime } -d_{i}\right) ,0,\ldots ,0\right) ^{T}. \end{aligned}$$
(41)

Thus, it follows from Lemma 1, (40)and (41) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{\prime }}-\eta ^{{{\textbf{d}}}}&={\pi }^{\left( {{\textbf{d}}}^{\prime }\right) }\left[ \left( {{\textbf{B}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{B}}}^{\left( {{\textbf{d}}}\right) }\right) {{\textbf{g}}}^{\left( {{\textbf{d}}}\right) }+\left( {{\textbf{f}}}^{\left( {{\textbf{d}}}^{\prime }\right) }-{{\textbf{f}}}^{\left( {{\textbf{d}}}\right) }\right) \right] \nonumber \\&=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{\prime }\right) }\left( i\right) \left( d_{i}^{\prime }-d_{i}\right) \left[ g^{\left( {{\textbf{d}}}\right) }\left( i-1\right) -g^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\right] \nonumber \\&=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{\prime }\right) }\left( i\right) \left( d_{i}^{\prime }-d_{i}\right) \left[ {G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\right] . \end{aligned}$$
(42)

This completes the proof. \(\square \)

For \(d_{i}^{\prime },d_{i}\in \left\{ 0,1\right\} \) with \(d_{i}^{\prime }\ne d_{i}\), we have

$$\begin{aligned} d_{i}^{\prime }-d_{i}=\left\{ \begin{array}{cc} 1, &{} d_{i}^{\prime }=1,d_{i}=0;\\ -1, &{} d_{i}^{\prime }=0,d_{i}=1. \end{array} \right. \end{aligned}$$

Therefore, it is easy to see from (39) that to compare \(\eta ^{{{\textbf{d}}}^{\prime }}\) with \(\eta ^{{{\textbf{d}}}}\), it is necessary to further analyze the sign of function \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\). This will be developed in the next subsection.

6.3 The sign of \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\)

As seen from (42), the sign analysis of the performance difference \(\eta ^{{{\textbf{d}}}^{\prime }}-\eta ^{{{\textbf{d}}}}\) directly depends on that of \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\). Thus, this subsection provides the sign analysis of \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\) with respect to the penalty cost P.

Suppose that the inventory level is low. If the service priority is violated (i.e. the demands of Class 2 are served at a low stock), then the warehouse has to pay the penalty cost P for each product supplied to the demands of Class 2. Based on this, we study the influence of the penalty cost P on the sign of \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b}\).

Substituting (14), (15), (16) and (17) into (37), we obtain that for \(1\le i\le K,\)

$$\begin{aligned}&{G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b} \nonumber \\&\quad =R+C_{2,2} +\lambda ^{-i}\left[ B_{0}-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1} \lambda ^{r-i}\left[ B_{r}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) \nonumber \\&\quad \quad -P\left\{ 1+\lambda ^{-i}\left[ A_{0}-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1} ^{i-1}\lambda ^{r-i}\left[ A_{r}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) \right\} , \end{aligned}$$
(43)

which is linear in the penalty cost P.

From \({G}^{\left( {{\textbf{d}}}\right) }\left( i\right) +{b=0}\), we have

$$\begin{aligned}&\text { }P\left\{ 1+\lambda ^{-i}\left[ A_{0}-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1}\lambda ^{r-i}\left[ A_{r}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) \right\} \nonumber \\&=R+C_{2,2}+\lambda ^{-i}\left[ B_{0}-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1} ^{i-1}\lambda ^{r-i}\left[ B_{r}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) , \end{aligned}$$
(44)

thus, the unique solution of the penalty cost P to Equation (44) is given by

$$\begin{aligned} {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }=\frac{R+C_{2,2}+\lambda ^{-i}\left[ B_{0}-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1} \lambda ^{r-i}\left[ B_{r}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) }{1+\lambda ^{-i}\left[ A_{0}-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1} \lambda ^{r-i}\left[ A_{r}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) }. \end{aligned}$$
(45)

It’s easy to see from (43) that if \({{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }>0\) and \(0\le P\le {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }\), then \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\ge 0\); while if \(P\ge {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) },\) then \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\le 0\). Note that the equality can hold only if \(P={{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }\)

To understand the solution \({{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }\) for \(1\le i\le K\), we use a numerical example to show the solutions in Table . To do this, we take the system parameters: \(\lambda =3\), \(\mu _{1}=4\), \(\mu _{2}=2\), \(C_{1}=1\), \(C_{2,1}=4\), \(C_{2,2}=1\), \(C_{3}=5\) and \(C_{4}=1\). Further, we observe three different policies:

$$\begin{aligned} {{\textbf{d}}}_{1}&=\left( 0;1,1,1,1,1,1,1,1,1,1;1,1,1,1,1\right) ,\\ {{\textbf{d}}}_{2}&=\left( 0;0,0,0,0,0,0,0,0,0,0;1,1,1,1,1\right) ,\\ {{\textbf{d}}}_{3}&=\left( 0;0,0,0,0,0,1,1,1,1,1;1,1,1,1,1\right) . \end{aligned}$$
Table 2 Numerical analysis of solutions for three different policies

In the stock-rationing queue, we define two critical values related to the penalty cost P as

$$\begin{aligned} P_{H}\left( {{\textbf{d}}}\right) =\text { }\max _{{{\textbf{d}}}\in {\mathcal {D}} }\left\{ 0,{{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}} _{2}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }\right\} , \end{aligned}$$
(46)

and

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) =\min _{{{\textbf{d}}}\in {\mathcal {D}}}\left\{ {{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{2}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }\right\} . \end{aligned}$$
(47)

From Table 2, we see that it is possible to have \(P_{L}\left( {{\textbf{d}}} \right) <0\) for Policy \(\mathbf {d=d}_{2}\) or \(\mathbf {d=d}_{3}\).

The following proposition uses the two critical values \(P_{H}\left( {{\textbf{d}}}\right) \) and \(P_{L}\left( {{\textbf{d}}}\right) \), together with the penalty cost P, to provide some sufficient conditions under which the function \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\) is either positive, zero or negative.

Proposition 1

  1. (1)

    If \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), then for each \(i=1,2,\ldots ,K\),

    $$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\le 0. \end{aligned}$$
  2. (2)

    If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), then for each \(i=1,2,\ldots ,K\),

    $$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\ge 0. \end{aligned}$$

Proof

  1. (1)

    For any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), if \(P\ge P_{H}\left( {{\textbf{d}}}\right) \), then it follows from (46) that for each \(i=1,2,\ldots ,K\),

    $$\begin{aligned} P\ge {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

    this leads to that \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\le 0\).

  2. (2)

    For any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), if \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \), then it follows from (47) that for each \(i=1,2,\ldots ,K\),

    $$\begin{aligned} 0\le P\le {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

    this gives that \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\ge 0\). This completes the proof.

\(\square \)

However, for the case with \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\in {\mathcal {D}}\), it is a little bit complicated to determine the sign of \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\) for each \(i=1,2,\ldots ,K\). For this reason, our discussion will be left in the next section.

For any two policies \(\textbf{d,c}\in {\mathcal {D}}\),

$$\begin{aligned} {{\textbf{d}}}&=\left( 0;d_{1},d_{2},\ldots ,d_{i-1},d_{i},d_{i+1} ,\ldots ,d_{K};1,1,\ldots ,1\right) ,\\ {{\textbf{c}}}&=\left( 0;c_{1},c_{2},\ldots ,c_{i-1},c_{i},c_{i+1} ,\ldots ,c_{K};1,1,\ldots ,1\right) . \end{aligned}$$

we write

$$\begin{aligned} S\left( \textbf{d,c}\right) =\left\{ i:d_{i}\ne c_{i},i=1,2,\ldots ,K-1,K\right\} \end{aligned}$$

and its complementary set

$$\begin{aligned} \overline{S\left( \textbf{d,c}\right) }=\left\{ i:d_{i}=c_{i},i=1,2,\ldots ,K-1,K\right\} . \end{aligned}$$

Then

$$\begin{aligned} S\left( \textbf{d,c}\right) \cup \overline{S\left( \textbf{d,c}\right) }=\left\{ 1,2,\ldots ,K-1,K\right\} . \end{aligned}$$

The following lemma sets up a policy sequence such that any two adjacent policies of them have the difference at the corresponding position of only one element. The proof is easy and is omitted here.

Lemma 3

For any two policies \(\textbf{d,c}\in {\mathcal {D}}\), \(S\left( \textbf{d,c}\right) =\left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \), then there exists a policy sequence: \({{\textbf{d}}}^{\left( k\right) }\) for \(k=1,2,3,\ldots ,n-1,n\), such that

$$\begin{aligned} S\left( \textbf{d,d}^{\left( 1\right) }\right)&=\left\{ j_{1}\right\} ,\\ S\left( {{\textbf{d}}}^{\left( 1\right) }\textbf{,d}^{\left( 2\right) }\right)&=\left\{ j_{2}\right\} ,\\&\vdots \\ S\left( {{\textbf{d}}}^{\left( n-1\right) }\textbf{,d}^{\left( n\right) }\right)&=\left\{ j_{n}\right\} , \end{aligned}$$

where \({{\textbf{d}}}^{\left( n\right) }={{\textbf{c}}}\), and \(\left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} =\left\{ j_{1},j_{2},j_{3},\ldots ,j_{n-1},j_{n}\right\} \). Also, for \(k=1,2,3,\ldots ,n-1,n\), we have

$$\begin{aligned} S\left( \textbf{d,d}^{\left( k\right) }\right) =\left\{ j_{1},j_{2},j_{3},\ldots ,j_{k}\right\} . \end{aligned}$$

The following theorem provides a class property of the policies in the set \({\mathcal {D}}\) by means of the function \(G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\) for any policy \({{\textbf{c}}}\in {\mathcal {D}}\) and for each \(i\in S\left( \textbf{d,c}\right) \), where Policy \({{\textbf{d}}}\) is any given reference policy in the set \({\mathcal {D}}\). Note that the class property will play a key role in develo** some new structural properties of the optimal dynamic rationing policy.

Theorem 3

  1. (1)

    If \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then for any policy \({{\textbf{c}}}\in {\mathcal {D}}\) and for each \(i\in S\left( \textbf{d,c}\right) \),

    $$\begin{aligned} G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\le 0. \end{aligned}$$
  2. (2)

    If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then for any policy \({{\textbf{c}}}\in {\mathcal {D}}\) and for each \(i\in S\left( \textbf{d,c}\right) \),

    $$\begin{aligned} G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\ge 0. \end{aligned}$$

Proof

We only prove (1), while (2) can be proved similarly.

If \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then it follows from (1) of Proposition 1 that for \(i=1,2,\ldots ,K,\)

$$\begin{aligned} G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b\le 0. \end{aligned}$$

From Policy \({{\textbf{d}}}\), we observe any different policy \({{\textbf{c}}} \in {\mathcal {D}}\). If the two policies \({{\textbf{d}}}\) and \({{\textbf{c}}}\) have n different elements: \(d_{i_{l}}\ne \) \(c_{i_{l}}\) for \(l=1,2,\ldots ,n\), then \(S\left( \textbf{d,c}\right) =\left\{ i_{l}:l=1,2,\ldots ,n\right\} \).

Note that the performance difference Eq. (39) can only be applied to two policies \({{\textbf{d}}}^{\prime }\) and \({{\textbf{d}}}\) with an interrelated structure at Position i :  \(d_{i}^{\prime },d_{i}\in \left\{ 0,1\right\} \) with \(d_{i}^{\prime }\ne d_{i}\), thus for a policy \({{\textbf{c}}}\in {\mathcal {D}}\) with \(S\left( \textbf{d,c}\right) =\left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \), our following discussion needs to use the policy sequence: \({{\textbf{d}}}^{\left( k\right) }\) for \(k=1,2,3,\ldots ,n-1,n\), given in Lemma 3. To this end, our further proof is to use the mathematical induction in the following three steps:

Step one: Analyzing the two policies \({{\textbf{d}}}\) and \({{\textbf{d}}} ^{\left( 1\right) }\).

For each \(j_{1}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \), we take \(S\left( \textbf{d,d}^{\left( 1\right) }\right) =\left\{ j_{1}\right\} \). It follows from the performance difference Eq. (39) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{\left( 1\right) }}-\eta ^{{{\textbf{d}}}}=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{1}\right) \left( d_{j_{1}}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }-d_{j_{1}}\right) \left[ {G}^{\left( {{\textbf{d}}}\right) }\left( j_{1}\right) +{b}\right] . \end{aligned}$$
(48)

Similarly, we have

$$\begin{aligned} \eta ^{{{\textbf{d}}}}-\eta ^{{{\textbf{d}}}^{\left( 1\right) }}=\mu _{2}\pi ^{\left( {{\textbf{d}}}\right) }\left( j_{1}\right) \left( d_{j_{1}}-d_{j_{1}}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\right) \left[ {G}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{1}\right) +{b}\right] . \end{aligned}$$
(49)

It is easy to see from (48) and (49) that

$$\begin{aligned} {G}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{1}\right) +{b=}\frac{\pi ^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{1}\right) }{\pi ^{\left( {{\textbf{d}}}\right) }\left( j_{1}\right) }\left[ {G}^{\left( {{\textbf{d}}}\right) }\left( j_{1}\right) +{b}\right] \le 0. \end{aligned}$$
(50)

Therefore, for Policy \({{\textbf{d}}}^{\left( 1\right) }\in {\mathcal {D}}\), \({G}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{1}\right) +{b}\le 0\) for each \(j_{1}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \).

Step two: Analyzing the two policies \({{\textbf{d}}}^{\left( 1\right) }\) and \({{\textbf{d}}}^{\left( 2\right) }\).

For each \(j_{2}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \), we take \(S\left( {{\textbf{d}}}^{\left( 1\right) }\textbf{,d}^{\left( 2\right) }\right) =\left\{ j_{2}\right\} \). It is easy to see from (50) that

$$\begin{aligned} {G}^{\left( {{\textbf{d}}}^{\left( 2\right) }\right) }\left( j_{2}\right) +{b=}\frac{\pi ^{\left( {{\textbf{d}}}^{\left( 2\right) }\right) }\left( j_{2}\right) }{\pi ^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{2}\right) }\left[ {G}^{\left( {{\textbf{d}}}^{\left( 1\right) }\right) }\left( j_{2}\right) +{b}\right] \le 0. \end{aligned}$$

Therefore, for Policy \({{\textbf{d}}}^{\left( 2\right) }\in {\mathcal {D}}\), \({G}^{\left( {{\textbf{d}}}^{\left( 2\right) }\right) }\left( j_{2}\right) +{b}\le 0\) for each \(j_{2}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \).

Step three: Assume that for \(l=3,4,\ldots ,k-2,k-1\), we have obtained that for Policy \({{\textbf{d}}}^{\left( l\right) }\in {\mathcal {D}}\) with \(S\left( {{\textbf{d}}}^{\left( l-1\right) }\textbf{,d}^{\left( l\right) }\right) =\left\{ j_{l}\right\} \), \({G}^{\left( {{\textbf{d}}}^{\left( l\right) }\right) }\left( j_{l}\right) +{b}\le 0\) for each \(j_{l}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \). Now, we prove the next case with \(l=k\).

For each \(j_{k}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \), we take \(S\left( {{\textbf{d}}}^{\left( k-1\right) }\textbf{,d}^{\left( k\right) }\right) =\left\{ j_{k}\right\} \). It is easy to see from (50) that

$$\begin{aligned} {G}^{\left( {{\textbf{d}}}^{\left( k\right) }\right) }\left( j_{k}\right) +{b=}\frac{\pi ^{\left( {{\textbf{d}}}^{\left( k\right) }\right) }\left( j_{k}\right) }{\pi ^{\left( {{\textbf{d}}}^{\left( k-1\right) }\right) }\left( j_{k}\right) }\left[ {G}^{\left( {{\textbf{d}}}^{\left( k-1\right) }\right) }\left( j_{k}\right) +{b}\right] \le 0. \end{aligned}$$

This gives that for Policy \({{\textbf{d}}}^{\left( k\right) }\in {\mathcal {D}}\), \({G}^{\left( {{\textbf{d}}}^{\left( k\right) }\right) }\left( j_{k}\right) +{b}\le 0\) for each \(j_{k}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \). Thus, this result holds for the case with \(l=k\).

Following the above analysis, we can prove by induction that for Policy \({{\textbf{d}}}^{\left( n\right) }\in {\mathcal {D}}\), \({G}^{\left( {{\textbf{d}}} ^{\left( n\right) }\right) }\left( j_{n}\right) +{b}\le 0\) for each \(j_{n}\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \). Since \(\mathbf {c=d}^{\left( n\right) }\), we obtain that for Policy \({{\textbf{c}}} \in {\mathcal {D}}\), \({G}^{\left( {{\textbf{c}}}\right) }\left( i\right) +{b} \le 0\) for each \(i\in \left\{ i_{1},i_{2},i_{3},\ldots ,i_{n-1},i_{n}\right\} \). This completes the proof. \(\square \)

7 Monotonicity and optimality

In this section, we analyze the optimal dynamic rationing policy in the three different areas of the penalty cost: \(P\ge P_{H}\left( {{\textbf{d}}}\right) \); \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0<P\le P_{L}\left( {{\textbf{d}}} \right) \); and \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \), which are studied as three different subsections, respectively. For the three areas, some new structural properties of the optimal dynamic rationing policy are given by using our algebraic method. Also, it is easy to see that for the first two areas: \(P\ge P_{H}\left( {{\textbf{d}}}\right) \); and \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0<P\le P_{L}\left( {{\textbf{d}}}\right) \), the optimal dynamic rationing policy is of threshold type; while for the third area: \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \), it may not be of threshold type but must be of transformational threshold type.

As seen from Lemma 2, to compare \(\eta ^{{{\textbf{d}}}^{\prime }}\) with \(\eta ^{{{\textbf{d}}}}\), our aim is to focus on only Position i with \(d_{i}^{\prime }\ne d_{i}\) for \(d_{i}^{\prime },d_{i}\in \left\{ 0,1\right\} \). Also, Lemma 3 provides a useful class property of the policies in the set \({\mathcal {D}}\) under the function \(G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\) for any policy \(\textbf{c,d}\in {\mathcal {D}}\) and for each \(i\in S\left( \textbf{d,c}\right) \). The two lemmas are very useful for our research in the next subsections.

7.1 The penalty cost \(P\ge P_{H}\left( {{\textbf{d}}}\right) \)

In this subsection, for the area of the penalty cost: \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), we find the optimal dynamic rationing policy of the stock-rationing queue, and further compute the maximal long-run average profit of this system.

The following theorem uses the class property of the policies in the set \({\mathcal {D}}\), given in (1) of Theorem 3, to set up some basic relations between any two policies. Thus, we find the optimal dynamic rationing policy of the stock-rationing queue.

Theorem 4

If \(P\ge {P}_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then the optimal dynamic rationing policy of the stock-rationing queue is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) . \end{aligned}$$

This shows that if the penalty cost is higher with \(P\ge {P}_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then the warehouse can not supply any product to the demands of Class 2.

Proof

If \(P\ge {P}_{H}\) for any given policy \({{\textbf{d}}}\), then our proof will focus on that for any policy \({{\textbf{c}}}\in {\mathcal {D}}\), we can have

$$\begin{aligned} {{\textbf{d}}}^{*}\succeq {{\textbf{c}}}. \end{aligned}$$

Based on this, we need to study some useful relations among the three policies: \({{\textbf{d}}}\), \({{\textbf{c}}}\) and \({{\textbf{d}}}^{*}\), where \({{\textbf{d}}}^{*}\) is deterministic with \(d_{i}^{*}=0\) for each \(i=1,2,\ldots ,K-1,K\).

To compare \(\eta ^{{{\textbf{c}}}}\) with \(\eta ^{{{\textbf{d}}}^{*}}\), let \(S\left( {{\textbf{d}}}^{*}\textbf{,c}\right) =\left\{ n_{l}:l=1,2,\ldots ,n\right\} \) for \(1\le n\le K\). Then \(c_{n_{l}}=1\) for \(l=1,2,\ldots ,n\), since \(d_{i}^{*}=0\) for each \(i=1,2,\ldots ,K-1,K\).

For the two policies \({{\textbf{d}}}\) and \({{\textbf{c}}}\), we have \(d_{i},c_{i} \in \left\{ 0,1\right\} \). Further, for the three elements: \(d_{i}\), \(c_{i}\) and \(d_{i}^{*}=0\) for \(i\in S\left( {{\textbf{d}}}^{*}\textbf{,c}\right) \), we need to consider four different cases as follows:

Case one: \(d_{i}=c_{i}=d_{i}^{*}=0\). Since \(c_{i}=d_{i}^{*}\), this case does not require any analysis by using Lemma 2.

Case two: \(d_{i}=1\) and \(c_{i}=d_{i}^{*}=0\). Since \(c_{i} =d_{i}^{*}\), this case does not require any analysis by using Lemma 2.

Case three: \(c_{i}=1\) and \(d_{i}=d_{i}^{*}=0\). Note that \(c_{i}\ne d_{i}\), by using (1) of Theorem 3, we obtain that \(G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\le 0\). On the other hand, since \(c_{i}\ne d_{i}^{*}\), it follows from the performance difference Eq. (39) that for each \(i\in S\left( {{\textbf{d}}}^{*}\textbf{,c}\right) \),

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{*}}-\eta ^{{{\textbf{c}}}}&=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) \left( d_{i}^{*} -c_{i}\right) \left[ {G}^{\left( {{\textbf{c}}}\right) }\left( i\right) +{b}\right] \\&=-\mu _{2}\pi ^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) \left[ {G}^{\left( {{\textbf{c}}}\right) }\left( i\right) +{b}\right] \ge 0. \end{aligned}$$

Thus \(\eta ^{{{\textbf{d}}}^{*}}\ge \eta ^{{{\textbf{c}}}}\), this gives \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\).

Case four: \(d_{i}=c_{i}=1\) and \(d_{i}^{*}=0\). Note that \(d_{i}^{*}\ne d_{i}\), by using (1) of Theorem 3, we obtain that \(G^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +b\le 0\). On the other hand, since \(c_{i}\ne d_{i}^{*}\), it follows from the performance difference Eq. (39) that for each \(i\in S\left( {{\textbf{d}}} ^{*}\textbf{,c}\right) \),

$$\begin{aligned} \eta ^{{{\textbf{c}}}}-\eta ^{{{\textbf{d}}}^{*}}&=\mu _{2}\pi ^{\left( {{\textbf{c}}}\right) }\left( i\right) \left( c_{i}-d_{i}^{*}\right) \left[ {G}^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +{b}\right] \\&=\mu _{2}\pi ^{\left( {{\textbf{c}}}\right) }\left( i\right) \left[ {G}^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +{b}\right] \le 0. \end{aligned}$$

Thus \(\eta ^{{{\textbf{d}}}^{*}}\ge \eta ^{{{\textbf{c}}}}\), this gives \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\).

Based on the above four discussions, we obtain that \({{\textbf{d}}}^{*} \succeq {{\textbf{c}}}\) for any policy \({{\textbf{c}}}\in {\mathcal {D}}\). This completes the proof. \(\square \)

For \({{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \), let \({{\textbf{d}}}^{\left( n\right) }\) be a policy in the policy set \({\mathcal {D}}\) with

$$\begin{aligned} S\left( {{\textbf{d}}}^{*}\textbf{,d}^{\left( n\right) }\right) =\left\{ i_{l}:l=1,2,\ldots ,n\right\} \end{aligned}$$

for \(1\le n\le K\). To understand Policy \({{\textbf{d}}}^{\left( n\right) }\), we take three examples: \(S\left( {{\textbf{d}}}^{*}\textbf{,d}^{\left( 1\right) }\right) =\left\{ i_{1}\right\} \), \(S\left( {{\textbf{d}}}^{*}\textbf{,d}^{\left( 2\right) }\right) =\left\{ i_{1},i_{2}\right\} \), \(S\left( {{\textbf{d}}}^{*}\textbf{,d}^{\left( 3\right) }\right) =\left\{ i_{1},i_{2},i_{3}\right\} \). Also, \(S\left( {{\textbf{d}}}^{\left( n-1\right) }\textbf{,d}^{\left( n\right) }\right) =\left\{ i_{n}\right\} \) for \(1\le n\le K\). Note that

$$\begin{aligned} {{\textbf{d}}}^{\left( K\right) }=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

The following corollary provides a set-structured decreasing monotonicity of the policies \({{\textbf{d}}}^{\left( n\right) }\in {\mathcal {D}}\) for \(n=1,2,\ldots ,K-1,K\). In fact, this monotonicity is guaranteed by the class property of policies in the set \({\mathcal {D}}\), given in (1) of Theorem 3. The proof is easy by using a similar analysis to that in Theorem 4, thus it is omitted here.

Corollary 5

If \(P\ge {P}_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}} \), then

$$\begin{aligned} {{\textbf{d}}}^{*}\succeq {{\textbf{d}}}^{\left( 1\right) }\succeq {{\textbf{d}}} ^{\left( 2\right) }\succeq {{\textbf{d}}}^{\left( 3\right) }\succeq \cdots \succeq {{\textbf{d}}}^{\left( K-1\right) }\succeq {{\textbf{d}}}^{\left( K\right) }. \end{aligned}$$

In what follows we compute the maximal long-run average profit of the stock-rationing queue.

When \(P\ge {P}_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), the optimal dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) , \end{aligned}$$

thus it follows from (5) that

$$\begin{aligned} \begin{array}{l} \text { }\xi _{0}=1,\text { }i=0,\\ \xi _{i}^{({{\textbf{d}}}^{*})}=\left\{ \begin{array}{ll} \alpha ^{i}, &{} i=1,2,\ldots ,K,\\ \left( \frac{\alpha }{\beta }\right) ^{K}\beta ^{i}, &{} i=K+1,K+2,\ldots ,N, \end{array} \right. \end{array} \end{aligned}$$

and

$$\begin{aligned} h^{({{\textbf{d}}}^{*})}=1+\sum \limits _{i=1}^{N}\xi _{i}^{({{\textbf{d}}}^{*} )}=1+\frac{\alpha \left( 1-\alpha ^{K}\right) }{1-\alpha }+\left( \frac{\alpha }{\beta }\right) ^{K}\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{1-\beta }, \end{aligned}$$

where \(\alpha =\lambda /\mu _{1}\) and \(\beta =\lambda /\left( \mu _{1}+\mu _{2}\right) .\) It follows from (6) that

$$\begin{aligned} \pi ^{({{\textbf{d}}}^{*})}\left( i\right) =\left\{ \begin{array}{ll} \frac{1}{h^{({{\textbf{d}}}^{*})}}, &{} i=0,\\ \frac{1}{h^{({{\textbf{d}}}^{*})}}\xi _{i}^{({{\textbf{d}}}^{*})}, &{} i=1,2,\ldots ,N. \end{array} \right. \end{aligned}$$

At the same time, it follows from (8) to (11) that

$$\begin{aligned} \begin{array}{ll} f\left( 0\right) =-C_{2,1}\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda , &{} i=0;\\ f^{({{\textbf{d}}}^{*})}\left( i\right) =R\mu _{1}-C_{1}i-C_{2,2}\mu _{2} -C_{3}\lambda , &{} 1\le i\le K;\\ f\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3} \lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} }, &{} K+1\le i\le N. \end{array} \end{aligned}$$

Since

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{*}}=\sum \limits _{i=0}^{N}\pi ^{({{\textbf{d}}}^{*})}\left( i\right) f^{({{\textbf{d}}}^{*})}\left( i\right) , \end{aligned}$$
(51)

we obtain

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{*}}&=\frac{1}{h^{({{\textbf{d}}}^{*})}}\left\{ -\left( C_{2,1}\mu _{1}+C_{2,2}\mu _{2}+C_{3}\lambda \right) +\sum \limits _{i=0}^{K}\left( R\mu _{1}-C_{1}i-C_{2,2}\mu _{2}-C_{3}\lambda \right) \alpha ^{i}\right. \\&\text { }\left. \quad \quad +\sum \limits _{i=K+1}^{N}\left[ R\left( \mu _{1} +\mu _{2}\right) -C_{1}i-C_{3}\lambda 1_{\left\{ i<N\right\} }-C_{4} \lambda 1_{\left\{ i=N\right\} }\right] \left( \frac{\alpha }{\beta }\right) ^{K}\beta ^{i}\right\} \\&=\frac{1}{h^{({{\textbf{d}}}^{*})}}\left\{ -\gamma _{1}+\gamma _{2}\frac{\alpha \left( 1-\alpha ^{K}\right) }{1-\alpha }-C_{1}\left[ \frac{\alpha \left( 1-\alpha ^{K}\right) }{\left( 1-\alpha \right) ^{2} }-\frac{K\alpha ^{K+1}}{1-\alpha }\right] \right. \\&\text { }\left. \quad \quad +\left( \frac{\alpha }{\beta }\right) ^{K}\gamma _{3}\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{1-\beta }\right. \\&\text { }\left. \quad \quad -\left( \frac{\alpha }{\beta }\right) ^{K} C_{1}\left[ \frac{K\beta ^{K+1}-N\beta ^{N+1}}{1-\beta }+\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{\left( 1-\beta \right) ^{2}}\right] \right\} , \end{aligned}$$

where

$$\begin{aligned} \gamma _{1}&=C_{2,1}\mu _{1}+C_{2,2}\mu _{2}+C_{3}\lambda ,\\ \gamma _{2}&=R\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda ,\\ \gamma _{3}&=R\left( \mu _{1}+\mu _{2}\right) -C_{3}\lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} }. \end{aligned}$$

7.2 The penalty cost \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \)

In this subsection, we consider the area of the penalty cost: \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\). We first find the optimal dynamic rationing policy of the stock-rationing queue. Then we compute the maximal long-run average profit of this system.

The following theorem finds the optimal dynamic rationing policy of the stock-rationing queue in the area of the penalty cost: \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\). The proof is similar to that of Theorem 4.

Theorem 6

If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then the optimal dynamic rationing policy of the stock-rationing queue is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

This shows that if the penalty cost is lower with \(P_{L}\left( {{\textbf{d}}} \right) >0\) and \(0\le P\le {P}_{L}\left( {{\textbf{d}}}\right) \), then the warehouse would like to supply the products to the demands of Class 2.

Proof

If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then we prove that for any policy \({{\textbf{c}}}\in {\mathcal {D}}\),

$$\begin{aligned} {{\textbf{d}}}^{*}\succeq {{\textbf{c}}}. \end{aligned}$$

For this, we need to consider the three policies: \({{\textbf{d}}}\), \({{\textbf{c}}}\) and \({{\textbf{d}}}^{*}\), where \({{\textbf{d}}}^{*}\) is deterministic with \(d_{i}^{*}=1\) for each \(i=1,2,\ldots ,K-1,K\).

To compare \(\eta ^{{{\textbf{c}}}}\) with \(\eta ^{{{\textbf{d}}}^{*}}\), let \(S\left( {{\textbf{d}}}^{*}\textbf{,c}\right) =\left\{ n_{l}:l=1,2,\ldots ,m\right\} \) for \(1\le m\le K\). Then \(c_{n_{l}}=0\) for \(l=1,2,\ldots ,m\), since \(d_{i}^{*}=1\) for each \(i=1,2,\ldots ,K-1,K\).

For the two policies \({{\textbf{d}}}\) and \({{\textbf{c}}}\), we have \(d_{i},c_{i} \in \left\{ 0,1\right\} \). Based on this, for the three elements: \(d_{i}\), \(c_{i}\) and \(d_{i}^{*}=1\) for \(i\in S\left( {{\textbf{d}}}^{*} \textbf{,c}\right) \), we need to consider four different cases as follows:

Case one: \(d_{i}=c_{i}=d_{i}^{*}=1\). Since \(c_{i}=d_{i}^{*}\), this case does not require any analysis according to Lemma 2.

Case two: \(d_{i}=0\) and \(c_{i}=d_{i}^{*}=1\). Since \(c_{i} =d_{i}^{*}\), this case does not require any analysis by using Lemma 2.

Case three: \(c_{i}=0\) and \(d_{i}=d_{i}^{*}=1\). Note that \(c_{i}\ne d_{i}\), by using (2) of Theorem 3, we obtain that \(G^{\left( {{\textbf{c}}}\right) }\left( i\right) +b\ge 0\). On the other hand, since \(c_{i}\ne d_{i}^{*}\), it follows from the performance difference Eq. (39) that for each \(i\in S\left( {{\textbf{d}}}^{*}\textbf{,c}\right) \),

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{*}}-\eta ^{{{\textbf{c}}}}&=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) \left( d_{i}^{*} -c_{i}\right) \left[ {G}^{\left( {{\textbf{c}}}\right) }\left( i\right) +{b}\right] \\&=\mu _{2}\pi ^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) \left[ {G}^{\left( {{\textbf{c}}}\right) }\left( i\right) +{b}\right] \ge 0. \end{aligned}$$

Thus \(\eta ^{{{\textbf{d}}}^{*}}\ge \eta ^{{{\textbf{c}}}}\), this gives \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\).

Case four: \(d_{i}=c_{i}=0\) and \(d_{i}^{*}=1\). Note that \(d_{i}^{*}\ne d_{i}\), by using (2) of Theorem 3, we obtain that \(G^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +b\ge 0\). On the other hand, since \(c_{i}\ne d_{i}^{*}\), it follows from the performance difference Eq. (39) that for each \(i\in S\left( {{\textbf{d}}} ^{*}\textbf{,c}\right) \),

$$\begin{aligned} \eta ^{{{\textbf{c}}}}-\eta ^{{{\textbf{d}}}^{*}}&=\mu _{2}\pi ^{\left( {{\textbf{c}}}\right) }\left( i\right) \left( c_{i}-d_{i}^{*}\right) \left[ {G}^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +{b}\right] \\&=-\mu _{2}\pi ^{\left( {{\textbf{c}}}\right) }\left( i\right) \left[ {G}^{\left( {{\textbf{d}}}^{*}\right) }\left( i\right) +{b}\right] \le 0. \end{aligned}$$

Thus \(\eta ^{{{\textbf{d}}}^{*}}\ge \eta ^{{{\textbf{c}}}}\), this gives \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\). This completes the proof. \(\square \)

For \({{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \), let \({{\textbf{d}}}^{\left( n\right) }\) be a policy in the policy set \({\mathcal {D}}\) with \(S\left( {{\textbf{d}}}^{*}\textbf{,d}^{\left( n\right) }\right) =\left\{ k_{l}:l=1,2,\ldots ,n\right\} \) for \(1\le n\le K\), where

$$\begin{aligned} \widetilde{{{\textbf{d}}}}^{\left( K\right) }=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) . \end{aligned}$$

The following corollary provides a set-structured decreasing monotonicity of the policies \({{\textbf{d}}}^{\left( n\right) }\in {\mathcal {D}}\) for \(n=1,2,\ldots ,K-1,K\). This monotonicity comes from the class property of the policies in the set \({\mathcal {D}}\), given in (2) of Theorem 3. The proof is easy and omitted here.

Corollary 7

If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then

$$\begin{aligned} {{\textbf{d}}}^{*}\succeq {{\textbf{d}}}^{\left( 1\right) }\succeq {{\textbf{d}}} ^{\left( 2\right) }\succeq {{\textbf{d}}}^{\left( 3\right) }\succeq \cdots \succeq {{\textbf{d}}}^{\left( K-1\right) }\succeq {{\textbf{d}}}^{\left( K\right) }. \end{aligned}$$

If \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then the optimal dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

In this case, we obtain

$$\begin{aligned} \begin{array}{cl} \xi _{0}=1, &{} i=0,\\ \xi _{i}^{({{\textbf{d}}}^{*})}=\beta ^{i}, &{} i=1,2,\ldots ,N, \end{array} \end{aligned}$$

and

$$\begin{aligned} h^{({{\textbf{d}}}^{*})}=1+\sum \limits _{i=1}^{N}\xi _{i}^{({{\textbf{d}}}^{*} )}=1+\frac{\beta \left( 1-\beta ^{N}\right) }{1-\beta }. \end{aligned}$$

It follows from Subsection 1.1.4 of Chapter 1 in Li (2010) that

$$\begin{aligned} \pi ^{({{\textbf{d}}}^{*})}\left( i\right) =\left\{ \begin{array}{ll} \frac{1}{h^{({{\textbf{d}}}^{*})}} &{} i=0,\\ \frac{\beta ^{i}}{h^{({{\textbf{d}}}^{*})}}, &{} i=1,2,\ldots ,N, \end{array} \right. \end{aligned}$$

At the same time, it follows from (8) to (11) that

$$\begin{aligned} \begin{array}{ll} f\left( 0\right) =-C_{2,1}\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda , &{} i=0;\\ f^{({{\textbf{d}}}^{*})}\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda -P\mu _{2}, &{} 1\le i\le K;\\ f^{({{\textbf{d}}}^{*})}\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} }, &{} K+1\le i\le N. \end{array} \end{aligned}$$

Thus we obtain

$$\begin{aligned} \eta ^{{{\textbf{d}}}^{*}}&=\frac{1}{h^{({{\textbf{d}}}^{*})}}\left\{ -\left( C_{2,1}\mu _{1}+C_{2,2}\mu _{2}+C_{3}\lambda \right) +\sum \limits _{i=1}^{K}\left[ R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3} \lambda -P\mu _{2}\right] \beta ^{i}\right. \\ \text { }&\left. \text { }\quad \quad +\sum \limits _{i=K+1}^{N}\left[ R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda 1_{\left\{ i<N\right\} } -C_{4}\lambda 1_{\left\{ i=N\right\} }\right] \beta ^{i}\right\} \\&=\frac{1}{h^{({{\textbf{d}}}^{*})}}\left\{ -\gamma _{1}+\gamma _{4}\frac{\beta \left( 1-\beta ^{K}\right) }{1-\beta }-C_{1}\left[ \frac{\beta \left( 1-\beta ^{K}\right) }{\left( 1-\beta \right) ^{2} }-\frac{K\beta ^{K+1}}{1-\beta }\right] \right. \\&\left. \text { }\quad \quad +\gamma _{3}\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{1-\beta }-C_{1}\left[ \frac{K\beta ^{K+1}-N\beta ^{N+1}}{1-\beta } +\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{\left( 1-\beta \right) ^{2}}\right] \right\} , \end{aligned}$$

where

$$\begin{aligned} \gamma _{1}&=C_{2,1}\mu _{1}+C_{2,2}\mu _{2}+C_{3}\lambda ,\\ \gamma _{2}&=R\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda ,\\ \gamma _{3}&=R\left( \mu _{1}+\mu _{2}\right) -C_{3}\lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} },\\ \gamma _{4}&=R\left( \mu _{1}+\mu _{2}\right) -C_{3}\lambda -P\mu _{2}. \end{aligned}$$

7.3 The penalty cost \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \)

In this subsection, we discuss the third area of the penalty cost: \(P_{L}<P<P_{H}\) for any given policy \({{\textbf{d}}}\). Note that this analysis is a little more complicated than those in the previous two areas. To this end, we propose a new algebraic method to find the optimal dynamic rationing policy of the stock-rationing queue. Based on this, we show that the optimal dynamic rationing policy may not be of threshold type, but it must be of transformational threshold type.

For convenience of the readers, it is necessary and useful to simply recall several previous results as follows.

For any given policy \({{\textbf{d}}}=\left( 0;d_{1},d_{2},\ldots ,\ldots ,d_{K-1},d_{K};1,1,\ldots ,1\right) \), the unique solution of the linear equation \(G^{\left( {{\textbf{d}}}\right) }\left( i\right) +b=0\) in the penalty cost P is given by

$$\begin{aligned} {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }=\frac{R+C_{2,2}+\lambda ^{-i}\left[ B_{0}-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1} \lambda ^{r-i}\left[ B_{r}^{\left( {{\textbf{d}}}\right) }-{D}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) }{1+\lambda ^{-i}\left[ A_{0}-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=1}^{i-1}v\left( d_{k}\right) +\sum \limits _{r=1}^{i-1} \lambda ^{r-i}\left[ A_{r}^{\left( {{\textbf{d}}}\right) }-{F}^{\left( {{\textbf{d}}}\right) }\right] \prod \limits _{k=r+1}^{i-1}v\left( d_{k}\right) }, \end{aligned}$$

which is a fixed real number for \(1\le i\le K\).

We introduce a convention: If \({{\mathfrak {P}}}_{n-1}^{\left( {{\textbf{d}}}\right) }<{{\mathfrak {P}}}_{n}^{\left( {{\textbf{d}}}\right) }={{\mathfrak {P}}}_{n+1}^{\left( {{\textbf{d}}}\right) }=\cdots ={{\mathfrak {P}}}_{n+i}^{\left( {{\textbf{d}}}\right) }=c\) and \({{\mathfrak {P}}}_{n-1}^{\left( {{\textbf{d}}}\right) }<P\le c\), then we write

$$\begin{aligned} {{\mathfrak {P}}}_{n-1}^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}} _{n}^{\left( {{\textbf{d}}}\right) }={{\mathfrak {P}}}_{n+1}^{\left( {{\textbf{d}}} \right) }=\cdots ={{\mathfrak {P}}}_{n+i}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

that is, the penalty cost P is written in front of all the equal elements in the sequence \(\left\{ {{\mathfrak {P}}}_{k}^{\left( {{\textbf{d}}}\right) }:n\le k\le n+i\right\} \).

For the sequence \(\left\{ {{\mathfrak {P}}}_{k}^{\left( {{\textbf{d}}}\right) }:1\le k\le K\right\} \), we set up a new permutation from the smallest to the largest as follows:

$$\begin{aligned} {{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{i_{2} }^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K-1}}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

it is clear that \({{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}}\right) } =P_{L}\left( {{\textbf{d}}}\right) \) and \({{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) \). For convenience of the description, for the incremental sequence \(\left\{ {{\mathfrak {P}}}_{i_{j} }^{\left( {{\textbf{d}}}\right) }:1\le j\le K\right\} \), we write its subscript vector as \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \). Note that the subscript vector \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \) depends on Policy \({{\textbf{d}}}\).

The following lemma shows how the penalty cost P is distributed in the sequence \(\left\{ {{\mathfrak {P}}}_{k}^{\left( {{\textbf{d}}}\right) }:1\le k\le K\right\} \).

Lemma 4

If \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K\right\} \) such that either

$$\begin{aligned} {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }<P={{\mathfrak {P}}} _{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) } \end{aligned}$$

or

$$\begin{aligned} {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }<P<{{\mathfrak {P}}} _{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) }. \end{aligned}$$

Proof

Note that

$$\begin{aligned} P_{H}\left( {{\textbf{d}}}\right) =\max _{{{\textbf{d}}}\in {\mathcal {D}}}\left\{ 0,{{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{2}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }\right\} \end{aligned}$$

and

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) =\max _{{{\textbf{d}}}\in {\mathcal {D}}}\left\{ {{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{2}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }\right\} , \end{aligned}$$

it is easy to see that \(P_{H}\left( {{\textbf{d}}}\right) \) and \(P_{L}\left( {{\textbf{d}}}\right) \) are two fixed real numbers. If \(P_{L}<P<P_{H}\left( {{\textbf{d}}}\right) \) for Policy \({{\textbf{d}}}\), then there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) \le {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}} \right) }<P_{H}\left( {{\textbf{d}}}\right) . \end{aligned}$$

This shows that either for \(P={{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) },\)

$$\begin{aligned} {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }<P={{\mathfrak {P}}} _{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) }; \end{aligned}$$

or for \(P<{{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) },\)

$$\begin{aligned} {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }<P<{{\mathfrak {P}}} _{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) }. \end{aligned}$$

This completes the proof. \(\square \)

Now, our task is to develop a new method for finding the optimal dynamic rationing policy by means of the following two useful information: (a) The incremental sequence

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}} \right) }\le {{\mathfrak {P}}}_{i_{2}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K-1}}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}} _{i_{K}}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) ; \end{aligned}$$

and (b) the penalty cost P has a fixed position: \({{\mathfrak {P}}}_{i_{n_{0}} }^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) }\), where \(n_{0}\) is the minimal positive integer in the set \(\left\{ 1,2,\ldots ,K-1,K\right\} \).

In what follows we discuss two different cases: A simple case and a general case.

Case one: A simple case with

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{2}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{K-1}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) . \end{aligned}$$
(52)

In this case, the subscript vector is expressed as \(\left\{ 1,2,3,\ldots ,K-1,K\right\} \) depending on Policy \({{\textbf{d}}}\).

If \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{n_{0}-1}^{\left( {{\textbf{d}}}\right) } <P\le {{\mathfrak {P}}}_{n_{0}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}} \right) . \end{aligned}$$

Based on this, we take two different sets

$$\begin{aligned} \Lambda _{1}=\left\{ {{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{2}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{n_{0} -1}^{\left( {{\textbf{d}}}\right) }\right\} \end{aligned}$$

and

$$\begin{aligned} \Lambda _{2}=\left\{ {{\mathfrak {P}}}_{n_{0}}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{n_{0}+1}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}} _{K}^{\left( {{\textbf{d}}}\right) }\right\} . \end{aligned}$$

By using the two sets \(\Lambda _{1}\) and \(\Lambda _{2}\), we write

$$\begin{aligned} \overline{P}_{H}\left( \textbf{d;}1\rightarrow n_{0}-1\right) =\max _{1\le i\le n_{0}-1}\left\{ {{\mathfrak {P}}}_{i}^{\left( {{\textbf{d}}}\right) }\right\} \end{aligned}$$

and

$$\begin{aligned} \overline{P}_{L}\left( \textbf{d;}n_{0}\rightarrow K\right) =\min _{n_{0}\le j\le K}\left\{ {{\mathfrak {P}}}_{j}^{\left( {{\textbf{d}}}\right) }\right\} \text {.} \end{aligned}$$

It is clear that \(\overline{P}_{H}\left( \textbf{d;}1\rightarrow n_{0}-1\right) ={{\mathfrak {P}}}_{n_{0}-1}^{\left( {{\textbf{d}}}\right) }\) and \(\overline{P}_{L}\left( \textbf{d;}n_{0}\rightarrow K\right) ={{\mathfrak {P}}} _{n_{0}}^{\left( {{\textbf{d}}}\right) }\).

For this simple case, the following theorem finds the optimal dynamic rationing policy, which is of threshold type.

Theorem 8

For the simple case with \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), if there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{n_{0}-1}^{\left( {{\textbf{d}}}\right) } <P\le {{\mathfrak {P}}}_{n_{0}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{K}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}} \right) , \end{aligned}$$
(53)

then the optimal dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0}},\underset{K-n_{0}+1\text { ones}}{\underbrace{1,1,\ldots ,1} };1,1,\ldots ,1\right) . \end{aligned}$$

Proof

The proof follows that in Theorems 4 and 6.

On the one hand, in the set \(\Lambda _{1}\), it is easy to see from (53) that \(P>\overline{P}_{H}\left( \textbf{d;}1\rightarrow n_{0}-1\right) \) for Policy \({{\textbf{d}}}\). Now, our aim is to focus on a sub-policy

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{a}=\left( 0;d_{1},d_{2},\ldots ,d_{n_{0}-1},*,*,\ldots ,*;1,1,\ldots ,1\right) . \end{aligned}$$

For the sub-policy \(\left( d_{1},d_{2},\ldots ,d_{n_{0}-1}\right) \), it is easy to see from the set \(\Lambda _{1}\) that \(P>\overline{P}_{H}\left( \textbf{d;}1\rightarrow n_{0}-1\right) \). Thus it follows from Theorem 4 that the optimal dynamic rationing sub-policy is given by

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{a}^{*}=\left( 0;0,0,\ldots ,0,*,*,\ldots ,*;1,1,\ldots ,1\right) . \end{aligned}$$

On the other hand, it is seen from the set \(\Lambda _{2}\) that \(0\le P\le \overline{P}_{L}\left( \textbf{d;}n_{0}\rightarrow K\right) \) for Policy \({{\textbf{d}}}\). We consider another sub-policy

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{b}=\left( 0;*,*,\ldots ,*,d_{n_{0}},d_{n_{0}+1},\ldots ,d_{K};1,1,\ldots ,1\right) . \end{aligned}$$

For the sub-policy \(\left( d_{n_{0}},d_{n_{0}+1},\ldots ,d_{K}\right) \), it is easy to see from the set \(\Lambda _{2}\) that \(0\le P\le \overline{P} _{L}\left( \textbf{d;}n_{0}\rightarrow K\right) \). Thus it is easy to see from Theorem 6 that the optimal dynamic rationing sub-policy is given by

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{b}^{*}=\left( 0;*,*,\ldots ,*,1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

Based on the above two discussions, from the total set \(\Lambda _{1}\cup \Lambda _{2}\), by observing the total policy \(\left( d_{1},d_{2},\ldots ,d_{n_{0}-1};d_{n_{0}},d_{n_{0}+1},\ldots ,d_{K}\right) \) or Policy \({{\textbf{d}}}\), the optimal dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\widetilde{\left( \widetilde{{{\textbf{d}}}}_{a}^{*}\right) }_{b}^{*}=\widetilde{\left( \widetilde{{{\textbf{d}}}}_{b}^{*}\right) }_{a}^{*}=\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0}},\underset{K-n_{0}+1\text { ones}}{\underbrace{1,1,\ldots ,1} };1,1,\ldots ,1\right) . \end{aligned}$$

This completes the proof. \(\square \)

Remark 4

It is easy to see that in Theorems 4, 6 and 8, the optimal dynamic rationing policy is of threshold type (i.e., critical rationing level).

Case two: A general case with

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}} \right) }\le {{\mathfrak {P}}}_{i_{2}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K-1}}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}} _{i_{K}}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) . \end{aligned}$$

For the incremental sequence \(\left\{ {{\mathfrak {P}}}_{i_{j}}^{\left( {{\textbf{d}}}\right) }:j=1,2,\ldots ,K\right\} \), we write its subscript vector as \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \), which depends on Policy \({{\textbf{d}}}\). In the general case, we assume that \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \ne \left( 1,2,\ldots ,K-1,K\right) \).

If \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}} \right) }\le \cdots \le {{\mathfrak {P}}}_{i_{n_{0}-1}}^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) . \end{aligned}$$

Based on this, we take two sets

$$\begin{aligned} \Lambda _{1}^{G}=\left\{ {{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{i_{2}}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}} _{i_{n_{0}-1}}^{\left( {{\textbf{d}}}\right) }\right\} \end{aligned}$$

and

$$\begin{aligned} \Lambda _{2}^{G}=\left\{ {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) },{{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}}\right) },\ldots ,{{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}\right) }\right\} . \end{aligned}$$

For the two sets \(\Lambda _{1}^{G}\) and \(\Lambda _{2}^{G}\), we write

$$\begin{aligned} \overline{P}_{H}^{G}\left( \textbf{d;}1\rightarrow n_{0}-1\right) =\max _{1\le k\le n_{0}-1}\left\{ {{\mathfrak {P}}}_{i_{k}}^{\left( {{\textbf{d}}}\right) }\right\} \end{aligned}$$

and

$$\begin{aligned} \overline{P}_{L}^{G}\left( \textbf{d;}n_{0}\rightarrow K\right) =\min _{n_{0}\le k\le K}\left\{ {{\mathfrak {P}}}_{i_{k}}^{\left( {{\textbf{d}}}\right) }\right\} \text {,} \end{aligned}$$

It is clear that \(\overline{P}_{H}^{G}\left( \textbf{d;}1\rightarrow n_{0}-1\right) ={{\mathfrak {P}}}_{i_{n_{0}-1}}^{\left( {{\textbf{d}}}\right) }\) and \(\overline{P}_{L}^{G}\left( \textbf{d;}n_{0}\rightarrow K\right) ={{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }\).

Corresponding to the subscript vector of the incremental sequence \(\left\{ {{\mathfrak {P}}}_{i_{k}}^{\left( {{\textbf{d}}}\right) }:1\le k\le K\right\} \), we transfer Policy

$$\begin{aligned} {{\textbf{d}}}=\left( 0;d_{1},d_{2},\ldots ,d_{n_{0}-1},d_{n_{0}},d_{n_{0} +1},\ldots ,d_{K};1,1,\ldots ,1\right) \end{aligned}$$

into a new transformational policy

$$\begin{aligned} {{\textbf{d}}}\left( \text {Transfer}\right) =\left( 0;d_{i_{1}},d_{i_{2} },\ldots ,d_{i_{n_{0}-1}},d_{i_{n_{0}}},d_{i_{n_{0}+1}},\ldots ,d_{i_{K} };1,1,\ldots ,1\right) . \end{aligned}$$

Therefore, a transformation of the optimal dynamic policy \({{\textbf{d}}}^{*}\) is

$$\begin{aligned} \left( 1,2,\ldots ,K-1,K\right) \Rightarrow \left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) ; \end{aligned}$$

and an inverse transformation of the optimal transformational dynamic policy \({{\textbf{d}}}^{*}\left( \text {Transfer}\right) \) is

$$\begin{aligned} \left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \Rightarrow \left( 1,2,\ldots ,K-1,K\right) . \end{aligned}$$

For the general case, the following theorem finds the optimal dynamic rationing policy, which may not be of threshold type, but must be of transformational threshold type.

Theorem 9

For the general case with \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), if there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} P_{L}\left( {{\textbf{d}}}\right) ={{\mathfrak {P}}}_{i_{1}}^{\left( {{\textbf{d}}} \right) }\le \cdots \le {{\mathfrak {P}}}_{i_{n_{0}-1}}^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}\right) }=P_{H}\left( {{\textbf{d}}}\right) , \end{aligned}$$

then the optimal transformational dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}\left( \text {Transfer}\right) =\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0}},\underset{K-n_{0}+1\text { ones}}{\underbrace{1,1,\ldots ,1}};1,1,\ldots ,1\right) . \end{aligned}$$

Proof

From the set \(\Lambda _{1}^{G}\), it is easy to see that \(P>\overline{P}_{H}^{G}\left( \textbf{d;}1\rightarrow n_{0}-1\right) \). Hence we consider the transformational sub-policy

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{a}\left( \text {Transfer}\right) =\left( 0;d_{i_{1} },d_{i_{2}},\ldots ,d_{i_{n_{0}-1}},*,*,\ldots ,*;1,1,\ldots ,1\right) . \end{aligned}$$

By observing the transformational sub-policy \(\left( d_{i_{1} },d_{i_{2}},\ldots ,d_{i_{n_{0}-1}}\right) \) related to \(P>\overline{P} _{H}^{G}(\textbf{d;}1\rightarrow n_{0}-1)\), it is easy to see from the proof of Theorem 4 that the optimal transformational dynamic rationing sub-policy is given by

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{a}^{*}\left( \text {Transfer}\right) =\left( 0;0,0,\ldots ,0,*,*,\ldots ,*;1,1,\ldots ,1\right) . \end{aligned}$$

Similarly, from \(0\le P\le \overline{P}_{L}^{G}\left( \textbf{d;} n_{0}\rightarrow K\right) \) in the set \(\Lambda _{2}^{G}\), we discuss the transformational sub-policy

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{b}\left( \text {Transfer}\right) =\left( 0;*,*,\ldots ,*,d_{i_{n_{0}}},d_{i_{n_{0}+1}},\ldots ,d_{i_{K}};1,1,\ldots ,1\right) . \end{aligned}$$

By observing the transformational sub-policy \(\left( d_{n_{0}},d_{n_{0} +1},\ldots ,d_{K}\right) \) related to \(0\le P\le \overline{P}_{L}^{G}\left( \textbf{d;}n_{0}\rightarrow K\right) \), it is easy to see from the proof of Theorem 6 that the optimal transformational dynamic rationing sub-policy is given by

$$\begin{aligned} \widetilde{{{\textbf{d}}}}_{b}^{*}\left( \text {Transfer}\right) =\left( 0;*,*,\ldots ,*,1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

Therefore, by observing the total transformational sub-policy \((d_{i_{1} },d_{i_{2}},\ldots ,d_{i_{n_{0}-1}},d_{i_{n_{0}}}, d_{i_{n_{0}+1}}\), \(\ldots ,d_{i_{K}})\) in the total set \(\Lambda _{1}^{G}\cup \Lambda _{2}^{G}\), the optimal transformational dynamic rationing policy is given by

$$\begin{aligned} {{\textbf{d}}}^{*}\left( \text {Transfer}\right)&=\widetilde{\left( \widetilde{{{\textbf{d}}}}_{a}^{*}\left( \text {Transfer}\right) \right) }_{b}^{*}\left( \text {Transfer}\right) =\widetilde{\left( \widetilde{{{\textbf{d}}}}_{b}^{*}\left( \text {Transfer}\right) \right) }_{a}^{*}\left( \text {Transfer}\right) \\&=\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0} },\underset{K-n_{0}+1\text { ones}}{\underbrace{1,1,\ldots ,1}};1,1,\ldots ,1\right) . \end{aligned}$$

This completes the proof. \(\square \)

Remark 5

  1. (1)

    For the general case, although the optimal dynamic rationing policy is not of threshold type, we show that it must be of transformational threshold type. Thus the optimal transformational dynamic policy of the stock-rationing queue has a beautiful form as follows:

    $$\begin{aligned} {{\textbf{d}}}^{*}\left( \text {Transfer}\right) =\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0}},\underset{K-n_{0}\text { ones}}{\underbrace{1,1,\ldots ,1};1,1,\ldots ,1}\right) . \end{aligned}$$
  2. (2)

    We use an inverse transformation of \({{\textbf{d}}}^{*}\left( \text {Transfer}\right) \) to be able to restore the original optimal dynamic policy \({{\textbf{d}}}^{*}\), since \({{\textbf{d}}}^{*}\left( \text {Transfer} \right) \) is always obtained easily. To indicate such an inverse process, we take a simple example:

    $$\begin{aligned} {{\mathfrak {P}}}_{1}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{3}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{4}^{\left( {{\textbf{d}}}\right) } \le {{\mathfrak {P}}}_{7}^{\left( {{\textbf{d}}}\right) }<P\le {{\mathfrak {P}}} _{2}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{5}^{\left( {{\textbf{d}}} \right) }\le {{\mathfrak {P}}}_{6}^{\left( {{\textbf{d}}}\right) }\le {{\mathfrak {P}}}_{8}^{\left( {{\textbf{d}}}\right) }, \end{aligned}$$

    it is easy to check that

    $$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;0,1,0,0,1,1,0,1;1,1,1,1\right) . \end{aligned}$$

Remark 6

The transformational version \({{\textbf{d}}}^{*}\left( \text {Transfer}\right) \) of the optimal dynamic rationing policy \({{\textbf{d}}}^{*}\) plays a key role in the applications of the sensitivity-based optimization to the study of stock-rationing queues. On the other hand, it is worthwhile to note that the RG-factorization of block-structured Markov processes can be extended and generalized to a more general optimal transformational version \({{\textbf{d}}} ^{*}\left( \text {Transfer}\right) \) in the study of stock-rationing block-structured queues. See Li (2010) and Ma et al. (2019) for more details.

Remark 7

The bang-bang control is an effective method to roughly describe the optimal dynamic policy, e.g., see **a et al. (2021) and Ma et al. (2019, 2021). However, our optimal transformational dynamic policy \({{\textbf{d}}}^{*}\left( \text {Transfer}\right) \) provides a more detailed result, and also can restore the original optimal dynamic policy \({{\textbf{d}}}^{*}\) by means of an inverse transformation: \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \Rightarrow \left( 1,2,\ldots ,K-1,K\right) \). Therefore, our optimal transformational dynamic rationing policy is superior to the bang-bang control.

The following theorem provides a useful summarization for Theorems 4 to 9. Based on this, for the optimal dynamic policy of the stock-rationing queue, we provide a complete algebraic solution: (a) The optimal dynamic rationing policy can hold under each of the three different conditions; and (b) the optimal transformational dynamic rationing policy (no optimal dynamic rationing policy) can hold under one condition

Theorem 10

For the stock-rationing queue with two demand classes, there must exist an optimal transformational dynamic rationing policy

$$\begin{aligned} {{\textbf{d}}}^{*}\left( \text {Transfer}\right) =\left( 0;\underset{n_{0}-1\text { zeros}}{\underbrace{0,0,\ldots ,0}},\underset{K-n_{0}\text { ones}}{\underbrace{1,1,\ldots ,1};1,1,\ldots ,1}\right) . \end{aligned}$$

Based on this finding, we can achieve the following two useful results:

  1. (a)

    The optimal dynamic rationing policy \({{\textbf{d}}}^{*}\) is of critical rationing level (i.e., threshold type) under each of the three conditions: (i) \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\); (i i) \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\); and (i i i) \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) with the subscript vector \(\left( 1,2,\ldots ,K-1,K\right) \) depending on Policy \({{\textbf{d}}}\).

  2. (b)

    The optimal dynamic rationing policy is not of critical rationing level (i.e., threshold type) if \(P_{L}\left( {{\textbf{d}}}\right)<P<P_{H}\left( {{\textbf{d}}}\right) \) with the subscript vector \(\left( i_{1},i_{2},\ldots ,i_{K-1},i_{K}\right) \ne \left( 1,2,\ldots ,K-1,K\right) \) depending on Policy \({{\textbf{d}}}\).

7.4 A global optimal analysis

In this subsection, for a fixed penalty cost P, we discuss how to find a global optimal policy of the stock-rationing queue with two demand classes by means of Theorem 10. Note that if \({{\textbf{d}}}^{*}\) is a global optimal policy of this system, then \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\) for any \({{\textbf{c}}}\in {\mathcal {D}}\). Also, we provide a simple effective method to be able to find the global optimal policy from the policy set \({\mathcal {D}}\).

In the policy set \({\mathcal {D}}\), we define two key policies:

$$\begin{aligned} {{\textbf{d}}}_{1}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \end{aligned}$$

and

$$\begin{aligned} {{\textbf{d}}}_{2}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

Note that there are \(2^{k}\) different policies in the set \({\mathcal {D}}\), we write

$$\begin{aligned} {\mathcal {D}}=\left\{ {{\textbf{d}}}_{1},{{\textbf{d}}}_{2};{{\textbf{c}}}_{3},{{\textbf{c}}}_{4},\ldots ,{{\textbf{c}}}_{2^{k}-1},{{\textbf{c}}}_{2^{k}}\right\} . \end{aligned}$$

The following theorem describes a useful characteristics of the two key policies \({{\textbf{d}}}_{1}\) and \({{\textbf{d}}}_{2}\) by means of the class property of the policies in the set \({\mathcal {D}}\), given in Theorem 3. This characteristics makes us to be able to find the global optimal policy of the stock-rationing queue.

Theorem 11

  1. (1)

    If a fixed penalty cost \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then \(P\ge P_{H}\left( {{\textbf{d}}}_{1}\right) \).

  2. (2)

    If a fixed penalty cost \(P_{L}\left( {{\textbf{d}}}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}\right) \) for any given policy \({{\textbf{d}}}\), then \(P_{L}\left( {{\textbf{d}}}_{2}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}_{2}\right) \).

Proof

We only prove (1), while (2) can be proved similarly.

We assume the penalty cost: \(P<P_{H}\left( {{\textbf{d}}}_{1}\right) \) for Policy \({{\textbf{d}}}_{1}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \). Then there exists the minimal positive integer \(n_{0}\in \left\{ 1,2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} 0<P\le {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}_{1}\right) }\le \cdots \le {{\mathfrak {P}}}_{i_{K}}^{\left( {{\textbf{d}}}_{1}\right) }=P_{H}\left( {{\textbf{d}}}_{1}\right) , \end{aligned}$$

and also there exists at least a positive integer \(m_{0}\in \left\{ n_{0}+1,n_{0}+2,\ldots ,K-1,K\right\} \) such that

$$\begin{aligned} {{\mathfrak {P}}}_{i_{m_{0}-1}}^{\left( {{\textbf{d}}}_{1}\right) }<{{\mathfrak {P}}} _{i_{m_{0}}}^{\left( {{\textbf{d}}}_{1}\right) }. \end{aligned}$$
(54)

Let

$$\begin{aligned} \overline{P}_{L}^{G}\left( {{\textbf{d}}}_{1},n_{0}\rightarrow K\right) =\min \left\{ {{\mathfrak {P}}}_{i_{n_{0}}}^{\left( {{\textbf{d}}}_{1}\right) },{{\mathfrak {P}}}_{i_{n_{0}+1}}^{\left( {{\textbf{d}}}_{1}\right) },\ldots ,{{\mathfrak {P}}}_{i_{K-1}}^{\left( {{\textbf{d}}}_{1}\right) },{{\mathfrak {P}}}_{i_{K} }^{\left( {{\textbf{d}}}_{1}\right) }\right\} ={{\mathfrak {P}}}_{i_{n_{0}} }^{\left( {{\textbf{d}}}_{1}\right) }>0. \end{aligned}$$

Then from \(0\le P\le \overline{P}_{L}^{G}\left( {{\textbf{d}}}_{1},n_{0}\rightarrow K\right) \), we discuss the transformational sub-policy

$$\begin{aligned} \widetilde{\left( {{\textbf{d}}}_{1}\right) }_{b}\left( \text {Transfer}\right) =\left( 0;*,*,\ldots ,*,d_{i_{n_{0}}},d_{i_{n_{0}+1}},\ldots ,d_{i_{K}};1,1,\ldots ,1\right) . \end{aligned}$$

By observing the transformational sub-policy \(\left( d_{n_{0}},d_{n_{0} +1},\ldots ,d_{K}\right) \) related to \(0\le P\le \overline{P}_{L}^{G}\left( {{\textbf{d}}}_{1},n_{0}\rightarrow K\right) \), it is easy to see from the proof of Theorem 6 that the optimal transformational dynamic rationing sub-policy is given by

$$\begin{aligned} \widetilde{\left( {{\textbf{d}}}_{1}\right) }_{b}^{*}\left( \text {Transfer} \right) =\left( 0;*,*,\ldots ,*,1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

This gives

$$\begin{aligned} \widetilde{\left( {{\textbf{d}}}_{1}\right) }_{b}^{*}\left( \text {Transfer} \right) \succ {{\textbf{d}}}_{1}={{\textbf{d}}}^{*} \end{aligned}$$
(55)

by using (54), where \({{\textbf{d}}}^{*}\) is given in Theorem 4.

Since for a fixed penalty cost \(P\ge P_{H}\left( {{\textbf{d}}}\right) \) for Policy \({{\textbf{d}}}\), it follows from Theorem 4 that the optimal dynamic rationing policy of the stock-rationing queue is given by

$$\begin{aligned} {{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) . \end{aligned}$$

By using (54), we obtain

$$\begin{aligned} \widetilde{\left( {{\textbf{d}}}_{1}\right) }_{b}^{*}\left( \text {Transfer} \right) \prec {{\textbf{d}}}^{*}. \end{aligned}$$
(56)

This makes a contradiction between (55) and (56), thus our assumption on the penalty cost: \(P<P_{H}\left( {{\textbf{d}}}_{1}\right) \) should not be correct. This completes the proof. \(\square \)

Theorem 11 shows that to find the optimal dynamic rationing policy of the stock-rationing queue, our first step is to check whether there exists (a) the penalty cost \(P\ge P_{H}\left( {{\textbf{d}}}_{1}\right) \), or (b) the fixed penalty cost \(P_{L}\left( {{\textbf{d}}}_{2}\right) >0\) and \(0\le P\le P_{L}\left( {{\textbf{d}}}_{2}\right) \). Thus, the two special policies \({{\textbf{d}}}_{1}\) and \({{\textbf{d}}}_{2}\) are chosen as the first observation of our algebraic method on the the optimal dynamic rationing policy.

The following theorem provides the global optimal solution to the optimal dynamic rationing policy of the stock-rationing queue.

Theorem 12

In the stock-rationing queue with two demand classes, we have

  1. (1)

    If a fixed penalty cost \(P\ge P_{H}\left( {{\textbf{d}}}_{1}\right) \), then \({{\textbf{d}}}^{*}={{\textbf{d}}}_{1}\succeq {{\textbf{c}}}\) for any \({{\textbf{c}}} \in {\mathcal {D}}\).

  2. (2)

    If a fixed penalty cost \(P_{L}\left( {{\textbf{d}}}_{2}\right) >0\) or \(0\le P\le P_{L}\left( {{\textbf{d}}}_{2}\right) \), then \({{\textbf{d}}}^{*}={{\textbf{d}}}_{2}\succeq {{\textbf{c}}}\) for any \({{\textbf{c}}}\in {\mathcal {D}}\).

  3. (3)

    If a fixed penalty cost P satisfies \(P<P_{H}\left( {{\textbf{d}}} _{1}\right) \) and \(P>P_{L}\left( {{\textbf{d}}}_{2}\right) \), then

    $$\begin{aligned} {{\textbf{d}}}^{*}=\max \left\{ \widetilde{\left( {{\textbf{d}}}_{1}\right) } _{b}^{*}\left( \text {Transfer}\right) ,\widetilde{\left( {{\textbf{d}}} _{2}\right) }_{a}^{*}\left( \text {Transfer}\right) ,\left( {{\textbf{c}}}_{k}\right) ^{*}\left( \text {Transfer}\right) \text { for }k=3,4,\ldots ,K\right\} \end{aligned}$$

    and \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\) for any \({{\textbf{c}}}\in {\mathcal {D}}\).

Proof

We only prove (3), while (1) and (2) are provided in those of Theorem 11.

If \(P<P_{H}\left( {{\textbf{d}}}_{1}\right) \) or \(P>P_{L}\left( {{\textbf{d}}} _{2}\right) \), then both \({{\textbf{d}}}_{1}\) and \({{\textbf{d}}}_{2}\) are not the optimal dynamic rationing policy of the system. In this case, by using Theorem 10, we indicate that the optimal dynamic rationing policy must be of transformational threshold type. Thus we have

$$\begin{aligned} {{\textbf{d}}}^{*}=\max \left\{ \widetilde{\left( {{\textbf{d}}}_{1}\right) } _{b}^{*}\left( \text {Transfer}\right) ,\widetilde{\left( {{\textbf{d}}} _{2}\right) }_{a}^{*}\left( \text {Transfer}\right) ,\left( {{\textbf{c}}}_{k}\right) ^{*}\left( \text {Transfer}\right) \text { for }k=3,4,\ldots ,K\right\} , \end{aligned}$$

which is of transformational threshold type, since K is a finite positive integer. It is clear that \({{\textbf{d}}}^{*}\succeq {{\textbf{c}}}\) for any \({{\textbf{c}}}\in {\mathcal {D}}\). This completes the proof. \(\square \)

8 The static rationing policies

In this section, we analyze the static (i.e., threshold type) rationing policies of the stock-rationing queue with two demand classes, and discuss the optimality of the static rationing policies. Furthermore, we provide a necessary condition under which a static rationing policy is optimal. Based on this, we can intuitively understand some differences between the optimal static and dynamic rationing policies.

To study static rationing policy, we define a static policy subset of the policy set \({\mathcal {D}}\) as follows. For \(\theta =1,2,\ldots ,K,K+1\), we write \({{\textbf{d}}}_{\triangle ,\theta }\) as a static rationing policy \({{\textbf{d}}}\) with \(d_{i}=0\) if \(1\le i\le \theta -1\) and \(d_{i}=1\) if \(\theta \le i\le K\). Clearly, if \(\theta =1\), then

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,1}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) ; \end{aligned}$$

if \(\theta =K\), then

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,K}=\left( 0;0,0,\ldots ,0,1;1,1,\ldots ,1\right) ; \end{aligned}$$

and if \(\theta =K+1\), then

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,K+1}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) . \end{aligned}$$

Let

$$\begin{aligned} {\mathcal {D}}^{\Delta }=\left\{ {{\textbf{d}}}_{\triangle ,\theta }:\theta =1,2,\ldots ,K,K+1\right\} . \end{aligned}$$

Then

$$\begin{aligned} {\mathcal {D}}^{\Delta }=\left\{ \left( 0;\underset{\theta -1\text { zeros} }{\underbrace{0,0,\ldots ,0}},1,1,\ldots ,1;1,1,\ldots ,1\right) :\theta =1,2,\ldots ,K,K+1\right\} . \end{aligned}$$

It is easy to see that the static rationing policy set \({\mathcal {D}}^{\Delta }\subset {\mathcal {D}}\).

For a static rationing policy \({{\textbf{d}}}_{\triangle ,\theta }=\left( 0;\underset{\theta -1\text { zeros}}{\underbrace{0,0,\ldots ,0}},1,1,\ldots ,1;1,1,\ldots ,1\right) \) with \(\theta =1,2,\ldots ,K,K+1\), it follows from (5) that

$$\begin{aligned} \begin{array}{l} \text { }\xi _{0}=1,\text { }i=0;\\ \xi _{i}^{({{\textbf{d}}}_{\triangle ,\theta })}=\left\{ \begin{array}{ll} \alpha ^{i}, &{} i=1,2,\ldots ,\theta -1;\\ \left( \frac{\alpha }{\beta }\right) ^{\theta -1}\beta ^{i}, &{} i=\theta ,\theta +1,\ldots ,N. \end{array} \right. \end{array} \end{aligned}$$

and

$$\begin{aligned} h^{({{\textbf{d}}}_{\triangle ,\theta })}&=1+\sum \limits _{i=1}^{N}\xi _{i}^{({{\textbf{d}}}_{\triangle ,\theta })}\\&=1+\frac{\alpha \left( 1-\alpha ^{\theta -1}\right) }{1-\alpha }+\left( \frac{\alpha }{\beta }\right) ^{\theta -1}\frac{\beta ^{\theta }\left( 1-\beta ^{N-\theta +1}\right) }{1-\beta }. \end{aligned}$$

It follows from (6) that

$$\begin{aligned} \pi ^{({{\textbf{d}}}_{\triangle ,\theta })}\left( i\right) =\left\{ \begin{array}{ll} \frac{1}{h^{({{\textbf{d}}}_{\triangle ,\theta })}}, &{} i=0;\\ \frac{1}{h^{({{\textbf{d}}}_{\triangle ,\theta })}}\alpha ^{i}, &{} i=1,2,\ldots ,\theta -1;\\ \frac{1}{h^{({{\textbf{d}}}_{\triangle ,\theta })}}\left( \frac{\alpha }{\beta }\right) ^{\theta -1}\beta ^{i}, &{} i=\theta ,\theta +1,\ldots ,N. \end{array} \right. \end{aligned}$$

On the other hand, it follows from (8) to (11) that for \(i=0\)

$$\begin{aligned} f\left( 0\right) =-C_{2,1}\mu _{1}-C_{2,2}\mu _{2}-C_{3}\lambda ; \end{aligned}$$

for \(i=1,2,\ldots ,\theta -1\),

$$\begin{aligned} f^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) =R\mu _{1}-C_{1}i-C_{2,2}\mu _{2}-C_{3}\lambda ; \end{aligned}$$

for \(i=\theta ,\theta +1,\ldots ,K,\)

$$\begin{aligned} f^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3}\lambda -P\mu _{2}; \end{aligned}$$

and for \(i=K+1,K+2,\ldots ,N,\)

$$\begin{aligned} f\left( i\right) =R\left( \mu _{1}+\mu _{2}\right) -C_{1}i-C_{3} \lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} }. \end{aligned}$$

Note that

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,\theta }}&=\pi ^{\left( {{\textbf{d}}} _{\triangle ,\theta }\right) }\left( 0\right) f\left( 0\right) +\sum _{i=1}^{\theta -1}\pi ^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) f^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) \\&\text { }\quad \quad +\sum _{i=\theta }^{K}\pi ^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) f^{\left( {{\textbf{d}}}_{\triangle ,\theta }\right) }\left( i\right) +\sum _{i=K+1}^{N}\pi ^{\left( {{\textbf{d}}} _{\triangle ,\theta }\right) }\left( i\right) f\left( i\right) , \end{aligned}$$

we obtain an explicit expression for the long-run average profit of the stock-rationing queue under the static rationing policy \({{\textbf{d}}} _{\triangle ,\theta }\) as follows:

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,\theta }}&=\frac{1}{h^{({{\textbf{d}}} _{\triangle ,\theta })}}\left\{ -\left( C_{2,1}\mu _{1}+C_{2,2}\mu _{2} +C_{3}\lambda \right) +\sum _{i=1}^{\theta -1}\alpha ^{i}\left( R\mu _{1} -C_{1}i-C_{2,2}\mu _{2}-C_{3}\lambda \right) \right. \\&\text { }\quad \quad +\sum _{i=\theta }^{K}\left( \frac{\alpha }{\beta }\right) ^{\theta -1}\beta ^{i}\left[ R\left( \mu _{1}+\mu _{2}\right) -C_{1} i-C_{3}\lambda -P\mu _{2}\right] \\&\text { }\left. \quad \quad +\sum _{i=K+1}^{N}\left( \frac{\alpha }{\beta }\right) ^{\theta -1}\beta ^{i}\left[ R\left( \mu _{1}+\mu _{2}\right) -C_{1} i-C_{3}\lambda 1_{\left\{ i<N\right\} }-C_{4}\lambda 1_{\left\{ i=N\right\} }\right] \right\} \\&=\frac{1}{h^{({{\textbf{d}}}_{\triangle ,\theta })}}\left\{ -\gamma _{1} +\gamma _{2}\frac{\alpha \left( 1-\alpha ^{\theta -1}\right) }{1-\alpha } -C_{1}\left[ \frac{\alpha \left( 1-\alpha ^{\theta -1}\right) }{\left( 1-\alpha \right) ^{2}}-\frac{\left( \theta -1\right) \alpha ^{\theta } }{1-\alpha }\right] \right. \\&\text { }\quad \quad -\left( \frac{\alpha }{\beta }\right) ^{\theta -1}C_{1}\left[ \frac{\left( \theta -1\right) \beta ^{\theta }-N\beta ^{N+1}}{1-\beta }+\frac{\beta ^{\theta }\left( 1-\beta ^{N-\theta +1}\right) }{\left( 1-\beta \right) ^{2}}\right] \\&\left. \text { }\quad \quad +\left( \frac{\alpha }{\beta }\right) ^{\theta -1} \gamma _{4}\frac{\beta ^{\theta }\left( 1-\beta ^{K-\theta +1}\right) }{1-\beta }+\left( \frac{\alpha }{\beta }\right) ^{\theta -1}\gamma _{3}\frac{\beta ^{K+1}\left( 1-\beta ^{N-K}\right) }{1-\beta }\text { }\right\} . \end{aligned}$$

Let

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,\theta }^{*}=\underset{{{\textbf{d}}}_{\triangle ,\theta }\in {\mathcal {D}}^{\Delta }}{\arg \max }\left\{ \eta ^{{{\textbf{d}}}_{\triangle ,\theta }}\right\} \end{aligned}$$

and

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,\theta ^{*}}=\underset{1\le \theta \le K+1}{\arg \max }\left\{ \eta ^{{{\textbf{d}}}_{\triangle ,\theta }}\right\} . \end{aligned}$$

Then \({{\textbf{d}}}_{\triangle ,\theta }^{*}={{\textbf{d}}}_{\triangle ,\theta ^{*}}\). Hence we call \({{\textbf{d}}}_{\triangle ,\theta }^{*}\) (or \({{\textbf{d}}} _{\triangle ,\theta ^{*}}\)) the optimal static rationing policy in the static rationing policy set \({\mathcal {D}}^{\Delta }\). Since \({\mathcal {D}}^{\Delta }\subset {\mathcal {D}}\), the partially ordered set \({\mathcal {D}}\) shows that \({\mathcal {D}}^{\Delta }\) is also a partially ordered set. Based on this, it is easy to see from the two partially ordered sets \({\mathcal {D}}\) and \({\mathcal {D}}^{\Delta }\) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,\theta }^{*}}\le \eta ^{{{\textbf{d}}}^{*}}\text {, or }{{\textbf{d}}}_{\triangle ,\theta }^{*}\preceq {{\textbf{d}}}^{*}, \end{aligned}$$

where \({{\textbf{d}}}^{*}\) is the optimal dynamic rationing policy in the set \({\mathcal {D}}\).

If \(\eta ^{{{\textbf{d}}}_{\triangle ,\theta }^{*}}=\eta ^{{{\textbf{d}}}^{*}}\), then the optimal static rationing policy \({{\textbf{d}}}_{\triangle ,\theta }^{*}\) is also optimal in the policy set \({\mathcal {D}}\), thus the optimal dynamic rationing policy is of threshold type. If \(\eta ^{{{\textbf{d}}}_{\triangle ,\theta }^{*}}<\eta ^{{{\textbf{d}}}^{*}}\), then the optimal static rationing policy \({{\textbf{d}}}_{\triangle ,\theta }^{*}\) is not optimal in the static rationing policy subset \({\mathcal {D}}^{\Delta }\), i.e., it is also suboptimal in the policy set \({\mathcal {D}}\), thus the optimal dynamic rationing policy is not of threshold type.

Now, we set up some conditions under which the optimal static rationing policy \({{\textbf{d}}}_{\triangle ,\theta }^{*}\) is suboptimal in the dynamic rationing policy set \({\mathcal {D}}\).

In the static rationing policy subset \({\mathcal {D}}^{\Delta }\), it is easy to see that there must exist a minimal positive integer \(\theta ^{*}\in \left\{ 1,2,\ldots ,K,K+1\right\} \) such that

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,\theta }^{*}={{\textbf{d}}}_{\triangle ,\theta ^{*} }=\left( 0;\underset{{\theta }^{*}-1\text { zeros}}{\underbrace{0,0,\ldots ,0},}1,1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

By using the optimal static rationing policy \({{\textbf{d}}}_{\triangle ,\theta }^{*}\) (or \({{\textbf{d}}}_{\triangle ,{\theta }^{*}}\)), the following theorem determines the sign of the function \(\ G^{\left( {{\textbf{d}}} _{\triangle ,\theta }\right) }\left( \theta \right) +b\) in the three different points: \(\theta =\theta ^{*}-1,\theta ^{*},\theta ^{*}+1\). This may be useful for us to understand how to use Proposition 1 to give the optimal long-run average profit of this system.

Theorem 13

In the stock-rationing queue, the static rationing policies \({{\textbf{d}}} _{\triangle ,{\theta }^{*}-1}\), \({{\textbf{d}}}_{\triangle ,\mathbf {\theta }^{*}}\) and \({{\textbf{d}}}_{\triangle ,{\theta }^{*}+1}\) satisfy the following conditions:

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}-1}\right) }\left( \theta ^{*}-1\right) +b\le 0\textbf{,}\text { }G^{\left( {{\textbf{d}}} _{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}-1\right) +b\le 0, \end{aligned}$$

and

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}\right) +b\ge 0\textbf{,}\text { }G^{\left( {{\textbf{d}}} _{\triangle ,{\theta }^{*}+1}\right) }\left( \theta ^{*}\right) +b\ge 0. \end{aligned}$$

Proof

We consider three static rationing policies with an interrelated structure as follows:

$$\begin{aligned} {{\textbf{d}}}_{\triangle ,{\theta }^{*}-1}&=\left( 0;\underset{{\theta }^{*}-2\text { zeros}}{\underbrace{0,0,\ldots ,0} ,}1,1,1,1,\ldots ,1;1,1,\ldots ,1\right) ,\\ {{\textbf{d}}}_{\triangle ,{\theta }^{*}}&=\left( 0;\underset{{\theta }^{*}-1\text { zeros}}{\underbrace{0,0,\ldots ,0,0} ,}1,1,1,\ldots ,1;1,1,\ldots ,1\right) ,\\ {{\textbf{d}}}_{\triangle ,{\theta }^{*}+1}&=\left( 0;\underset{{\theta }^{*}\text { zeros}}{\underbrace{0,0,\ldots ,0,0,0},} 1,\ldots ,1;1,1,\ldots ,1\right) . \end{aligned}$$

Note that \({{\textbf{d}}}_{\triangle ,{\theta }^{*}}\) is the optimal static rationing policy, and \({{\textbf{d}}}_{\triangle ,\theta }^{*} ={{\textbf{d}}}_{\triangle ,{\theta }^{*}}\). It is clear that \({{\textbf{d}}}_{\triangle ,{\theta }^{*}}\succeq {{\textbf{d}}}_{\triangle ,{\theta }^{*}-1}\) and \({{\textbf{d}}}_{\triangle ,{\theta }^{*} }\succeq {{\textbf{d}}}_{\triangle ,{\theta }^{*}+1}.\) Thus it follows from (39) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,\theta ^{*}+1}}-\eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}}=-\mu _{2}\pi ^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}+1}\right) }\left( \theta ^{*}\right) \left[ G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}\right) +b\right] , \end{aligned}$$

which, together with \(\eta ^{{{\textbf{d}}}_{\triangle ,\theta ^{*}+1}} -\eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}}\le 0\), leads to

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}\right) +b\ge 0\mathbf {.} \end{aligned}$$

On the other hand, it follows from (39) that

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}}-\eta ^{{{\textbf{d}}} _{\triangle ,\theta ^{*}+1}}=\mu _{2}\pi ^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}\right) \left[ G^{\left( {{\textbf{d}}}_{\triangle ,\theta ^{*}+1}\right) }\left( \theta ^{*}\right) +b\right] , \end{aligned}$$

this gives

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,\theta ^{*}+1}\right) }\left( \theta ^{*}\right) +b\ge 0 \end{aligned}$$

Similarly, by using \(\eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}} \ge \eta ^{{{\textbf{d}}}_{\triangle ,\theta ^{*}-1}}\) and

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}}-\eta ^{{{\textbf{d}}} _{\triangle ,\theta ^{*}-1}}=-\mu _{2}\pi ^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}-1\right) \left[ G^{\left( {{\textbf{d}}}_{\triangle ,\theta ^{*}-1}\right) }\left( \theta ^{*}-1\right) +b\right] , \end{aligned}$$

this gives

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,\theta ^{*}-1}\right) }\left( \theta ^{*}-1\right) +b\le 0\textbf{;} \end{aligned}$$

and

$$\begin{aligned} \eta ^{{{\textbf{d}}}_{\triangle ,\theta ^{*}-1}}-\eta ^{{{\textbf{d}}}_{\triangle ,{\theta }^{*}}}=\mu _{2}\pi ^{\left( {{\textbf{d}}}_{\triangle ,\theta ^{*}-1}\right) }\left( \theta ^{*}-1\right) \left[ G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}-1\right) +b\right] , \end{aligned}$$

we obtain

$$\begin{aligned} G^{\left( {{\textbf{d}}}_{\triangle ,{\theta }^{*}}\right) }\left( \theta ^{*}-1\right) +b\le 0. \end{aligned}$$

This completes the proof. \(\square \)

9 Numerical experiments

In this section, by observing several different penalty costs, we conduct numerical experiments to demonstrate our theoretical results and to gain insights on the optimal dynamic and static rationing policies in the stock-rationing queue.

In Examples 14, we take some common parameters in the stock-rationing queue with two demand classes as follows:

$$\begin{aligned} C_{1}=1,C_{2,1}=4,C_{2,2}=1,C_{3}=5,C_{4}=1,R=15,N=100. \end{aligned}$$

In Examples 1 and 2, we analyze some difference between the optimal static and dynamic rationing policies, and use the optimal static rationing policy to show whether or not the the optimal dynamic rationing policy is of threshold type.

Example 1

We give some useful comparisons of the optimal long-run average profit between two different penalty costs, and further verify how the optimality depends on the penalty cost in Theorems 4 and 6 for the the optimal dynamic rationing policy. To this end, we further take the system parameters as \(\lambda =3\), \(\mu _{1}=4\), \(\mu _{2}=2\), \(K=15\) and \(1\le i\le 15\).

Case one: A higher penalty cost

We take a higher penalty cost \(P=10\). If \(d_{i}^{*}=0\) for \(1\le i\le 15\) such that a possible optimal dynamic rationing policy \({{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \), then we obtain \(\eta ^{{{\textbf{d}}}^{*}}=22.3\). On the other hand, if \(d_{i}^{\prime *}=1\) for \(1\le i\le 15\) such that another possible optimal dynamic rationing policy \({{\textbf{d}}}^{\prime *}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \), then we get \(\eta ^{{{\textbf{d}}}^{\prime *}}=13\). By comparing \(\eta ^{{{\textbf{d}}} ^{*}}=22.3\) with \(\eta ^{{{\textbf{d}}}^{\prime *}}=13\), it is easy to see that the possible optimal dynamic rationing policy should be \({{\textbf{d}}} ^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \).

Case two: A lower penalty cost

We choose a lower penalty cost \(P=0.1\). If \(d_{i}^{*}=1\) for \(1\le i\le 15\) such that a possible optimal dynamic rationing policy \({{\textbf{d}}} ^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \), then we obtain \(\eta ^{{{\textbf{d}}}^{*}}=22.9\). On the other hand, if \(d_{i}^{\prime *}=0\) for \(1\le i\le 15\) such that another possible optimal dynamic rationing policy \({{\textbf{d}}}^{\prime *}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \), then we get \(\eta ^{{{\textbf{d}}}^{\prime *}}=22.3\). Obviously, the possible optimal dynamic rationing policy should be \({{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \).

Example 2

We use the numerical example to demonstrate whether or not the optimal static rationing policy is suboptimal in the policy set \({\mathcal {D}}\). If yes, then we show that the optimal dynamic rationing policy is not of threshold type. To this end, we take some system parameters: \(\lambda =3\), \(\mu _{1}=4\), \(\mu _{2}=2\), \(K=15\), \(1\le \theta \le 15\). These parameters are the same as those in Example 1.

In what follows our observation is to focus on the higher penalty cost \(P=10\) and the lower penalty cost \(P=0.1\), respectively.

Case one: A higher penalty cost

We observe how the optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) depends on the threshold from \(\theta =1\) to \(\theta =15\). From Fig. , it is seen that the optimal threshold is \(\theta ^{*}=9\) and \(\eta ^{{{\textbf{d}}} _{\Delta ,\theta ^{*}}}=21.4\). However, from Case one of Example 1, \(\eta ^{{{\textbf{d}}}^{*}}=22.3\). Thus we obtain that \(\eta ^{{{\textbf{d}}} _{\Delta ,\theta ^{*}}}=21.4<\eta ^{{{\textbf{d}}}^{*}}=22.3\). This shows that the optimal static rationing policy is suboptimal in the policy set \({\mathcal {D}}\), and the optimal dynamic rationing policy is not of threshold type. Thus, \({{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \), given in Example 1, is not the optimal dynamic rationing policy yet.

Fig. 3
figure 3

The optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) versus the threshold \(\theta \)

Case two: A lower penalty cost

From Fig. , it is seen that the optimal threshold is \(\theta ^{*}=3\) and \(\eta ^{{{\textbf{d}}}_{\Delta ,\theta ^{*}}}=22.9\). From Case two of Example 1, we obtained \(\eta ^{{{\textbf{d}}}^{*}}=22.9\). This gives that \(\eta ^{{{\textbf{d}}}_{\Delta ,\theta ^{*}}}=\eta ^{{{\textbf{d}}}^{*}}=22.9\). Therefore, the optimal static rationing policy is the same as the optimal dynamic rationing policy, and it is optimal in the policy set \({\mathcal {D}}\), and the optimal dynamic rationing policy is of threshold type. Thus, \({{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \), given in Example 1, is the optimal dynamic rationing policy.

Fig. 4
figure 4

The optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) versus the threshold \(\theta \)

Example 3

We analyze how the optimal long-run average profit of the stock-rationing queue depends on the arrival rate. Our observation focuses on the higher penalty cost \(P=10\) and the lower penalty cost \(P=0.1\), respectively. To do this, we further take the system parameters: \(\mu _{1}=30\), \(\mu _{2}=40\) and the threshold: \(K=5\), 6, 10.

Case one: A higher penalty cost

Let \(P=10\) and \({{\textbf{d}}}^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \). From Fig. , it is seen that the optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) increases as \(\lambda \) increases. In addition, with the threshold K increases, the optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) increases less slowly as \(\lambda \) increases.

Fig. 5
figure 5

\(\eta ^{{{\textbf{d}}}^{*}}\) versus \(\lambda \) under three different thresholds K

Case two: A lower penalty cost

Let \(P=0.1\) and \({{\textbf{d}}}^{*}=\left( 0;1,1,\ldots ,1;1,1,\ldots ,1\right) \). We discuss how the optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) depends on \(\lambda \) for \(\lambda \in \left( 65,80\right) \). From Fig. , it is seen that the optimal long-run average profit \(\eta ^{{{\textbf{d}}} ^{*}}\) increases as \(\lambda \) increases. In addition, with the threshold K increases, the optimal long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) increases less slowly as \(\lambda \) increases.

Fig. 6
figure 6

\(\eta ^{{{\textbf{d}}}^{*}}\) versus \(\lambda \) under three different thresholds K

Example 4

Our observation is to focus on how the penalty cost P influences the long-run average profit \(\eta ^{{{\textbf{d}}}}\) for any given policy \({{\textbf{d}}}\). From (17), it is easy to see that for any given policy \({{\textbf{d}}}\), the long-run average profit \(\eta ^{{{\textbf{d}}}}\) is linear in the penalty cost P. To show this, we take the system parameters: \(P\in \left( 0,50\right) \), \(\mu _{1}=4\), \(\mu _{2}=2\), \(\lambda =3\) and \(K=15\). In this case, we observe the special policy: \({{\textbf{d}}}_{1}=\) \({{\textbf{d}}} ^{*}=\left( 0;0,0,\ldots ,0;1,1,\ldots ,1\right) \). Figure  shows that for the special policy \({{\textbf{d}}}_{1}=\) \({{\textbf{d}}}^{*}\), the long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) linearly decreases as P increases.

Fig. 7
figure 7

The long-run average profit \(\eta ^{{{\textbf{d}}}^{*}}\) versus the penalty cost P

10 Concluding remarks

In this paper, we highlight intuitive understanding on the optimal dynamic rationing policy of the stock-rationing queue with two demand classes by means of the sensitivity-based optimization. To find the optimal dynamic rationing policy, we establish a policy-based birth-death process and a more general reward function such that the long-run average profit of the stock-rationing queue is expressed explicitly. Furthermore, we set up a policy-based Poisson equation and provide an explicit expression for its solution. Based on this, we derive a performance difference equation between any two policies such that we can find the optimal dynamic rationing policy and compute the maximal long-run average profit from three different areas of the penalty costs. Therefore, we provide an algebraic method to set up a complete algebraic solution to the optimal dynamic rationing policy. We show that the optimal dynamic policy must be of transformational threshold type, which leads to refining three simple sufficient conditions under each of which the optimal dynamic policy is of threshold type. In addition, we develop some new structural properties (e.g., set-structured monotonicity, and class property of policies) of the optimal dynamic rationing policy. Therefore, we believe that the methodology and results developed in this paper can be applicable to analyzing supply chain finance and with applications of blockchain technology, and open a series of potentially promising research.

Along such a line, there are a number of interesting directions for potential future research, for example:

  • Extending to the stock-rationing queues with multiple demand classes, multiple types of products, backorders, batch order, batch production, and so on;

  • analyzing non-Poisson input, such as Markovian arrival processes (MAPs); and/or non-exponential service times, e.g. the PH distributions;

  • discussing how the long-run profit can be influenced by some concave or convex reward functions;

  • studying individual or social optimization for stock-rationing queues from a perspective of game theory by means of the sensitivity-based optimization;

  • investigating optimal dynamic rationing policy in supply chain finance and with applications of blockchain technology.