FormalPara Highlights
  • A method for clearing and managing the backlog of elective surgeries following a pandemic is proposed.

  • The elective surgery backlog recovery process is modeled by a queueing network system.

  • A reinforcement learning-based backlog scheduling optimization algorithm is proposed.

  • The proposed method can rapidly clear the elective surgery backlog while ensuring timely care for all patients.

1 Introduction

The coronavirus disease 2019 (COVID-19) pandemic has caused hospitals around the world to temporarily delay nonemergent elective surgeries to reduce the infectious risk to both patients and providers while conserving hospitals’ capacities and available resources to take care of those patients who are the sickest due to COVID-19 [1]. This delay of surgeries has resulted in a backlog of uncompleted procedures that had been previously scheduled, as well as a dynamic backlog of surgeries that continue to be delayed as the health system experiences diminished capacity [2]. Studies have estimated that more than 28 million surgeries were either canceled or postponed during the first peak 12 weeks of the COVID-19 pandemic [3]. Taking the United States as an example, even under the most optimistic scenario, the United States may face a cumulative backlog of more than a million total joint and spine surgery cases and 1.1 million to 1.6 million cataract procedures by 2022, and the country may need up to 16 months to work through its backlog of orthopedic care needs [4, 5]. The size of the backlog for large general hospitals is alarmingly high, since these hospitals undertake a heavier load of daily practices and, at the same time, are more responsible for pandemic control.

It should be noted that continuing to delay these surgeries could result in increased disease progression and poor health outcomes for patients. Studies have shown that delays in surgical care for osteoarthritis can result in the progressive loss of mobility and health-related quality of life; in addition, patients living with knee osteoarthritis have a higher risk of death than the general population, and their level of risk increases as their walking disabilities become more severe [6, 7]. In addition to having a potential impact on morbidity or mortality, delays in care could also lead to worse patient experiences. Given that some types of “elective” procedures are more time sensitive than others, providers may need to determine how to prioritize care provided based on the urgency of the care required and whether a delay in care could lead to morbidity or mortality [8]. In addition, the deferment of medical care has a broader impact on the national economy. It has been reported that approximately half of the annualized 4.8% United States GDP that declined in the first quarter of 2020 was attributed to health care services, especially delayed elective procedures [2].

Given the importance of surgical care on the health of patients and the financial health of hospitals, solutions to eliminate such large backlogs while maintaining the regular throughput of surgical cases are crucial to the operation of health care systems [9]. Since building additional operating rooms may not be practical for most systems given the capital costs, physical space, and associated workforce needed to do so, operators will need a robust strategic plan to effectively schedule the cumulative backlog of elective surgeries and improve patient throughput. This paper considers optimizing the use of the existing capacity of large general hospitals to work through the backlog of elective surgeries without compromising patient outcomes.

In this paper, a planning method based on the operations management framework to help large general hospitals rapidly and efficiently recover their surgical services is provided, and a scheduling algorithm for clearing elective surgery backlogs during the outbreak of a major epidemic is proposed. This method considers the balance between surgery backlogs and newly arrived patients who need elective surgeries based on waiting time and clinical urgency. It reschedules the surgeries in the backlog queue and schedules the newly arrived surgeries to ensure that all surgical services can be finished before their due day. This paper uses Markov decision process (MDP) to model the recovery process for surgeries that have been delayed due to epidemic outbreaks and designs a surgery queueing and planning algorithm based on piecewise decay \(\epsilon\)-greedy reinforcement learning (PDGRL). The proposed method can dynamically determine the daily optimal plan for managing the backlog of patients awaiting surgery in the wake of a pandemic and provide a system-based solution for delivering timely surgical care to patients and preparing for future pandemic waves.

The remainder of this paper is organized as follows. In Section 2, the relevant literature is reviewed. Section 3 describes the queueing network system for modeling the surgery backlog and recovery process. Section 4 presents the proposed stochastic scheduling optimization algorithm for surgery backlog management. Section 5 gives the simulation experimental results of implementing the proposed method on an elective surgery backlog example. Section 6 concludes the research.

2 Literature review

2.1 COVID-19 surgical backlog

Since the outbreak of COVID-19, doctors and researchers from different countries have realized and studied the consequences of surgical service delays caused by the cancellation or postponement of nonemergent examinations and treatments. Uimonen et al. [10] collected and studied the data on urologist and elective urological procedures for 2017–2020 in Finland and concluded that the health care lockdown due to COVID-19 decreased the availability of nonemergent specialized urological care; the authors also pointed out that even though these surgical services have started to recover from the delays induced by the pandemic, there is still an underlying backlog, which may result in more patients with severe conditions waiting for treatment. Fu et al. [11] presented the severe impacts of delays in surgery during the COVID-19 pandemic on patent health outcomes, hospital finances and resources, academic training and research programs, as well as health care providers. The authors suggested that the health care system should form an organized response to manage the disruption and delay of surgeries.

Many studies have used various techniques to estimate the size of backlogs, the rate of surgery rescheduling, and the reduction in outpatient and surgical activities [12], as well as the time to recovery for surgery procedures delayed due to COVID-19 [4, 13,14,15]. Salenger et al. [16] used simple mathematical formulations to estimate the daily backlog for cardiac surgeries during the COVID-19 pandemic and to predict the length of time required to clear this backlog. The results showed that the amount of time necessary to clear the backlog would range from 1 to 8 months based on varied estimates of postpandemic increased operational capacity and that the waiting list mortality rate could be as high as 3.7% at 1 month and 11.6% at 6 months. This study emphasized the necessity of planning for postpandemic volume and treating patients within an acceptable time frame. Wilson et al. [17] used linear regression to quantify the volume of total hip arthroplasty and total knee arthroplasty cases delayed due to COVID-19 and to estimate the potential time required for the completion of backlogged cases. The proposed model suggested that the anticipatory planning of excess capacity and provisions for quickly and safely accommodating delayed patients are important. Wang et al. [18] used time series forecasting, queuing models and probabilistic sensitivity analysis to model the incremental surgical backlog that resulted from the COVID-19 outbreak in Ontario and to estimate the time and resources required to clear this backlog. It has been concluded that health care systems must employ innovative system-based solutions to provide patients with timely surgical care and prepare for future COVID-19 waves. Brandman et al. [19] developed an online tool to predict how long this process will take and what resources are required to address the impending backlog of elective cases based on single-entry queue models. Felfeli et al. [2.2 Application of reinforcement learning in health care management

To describe the stochastic and sequential decision processes, the system can be modeled by a set of system states and actions that are performed to control the system’s state. The state of the system changes over time. At any time, an action can be taken that may bring rewards to the system and cause a change in the system’s state. The objective is to control the system in such a way that a performance criterion is maximized. This model is called the Markov decision process (MDP); such processes are an intuitive and fundamental formalism for reinforcement learning (RL). When no prior knowledge about the MDP is presented, RL is an effective algorithm for computing good or optimal policies for problems that are modeled as an MDP [35, 36]. In the RL algorithm, an agent is connected to the system via perception and action. In each step, the agent receives information about the current state of the system, then the agent chooses an action that changes the system’s state and earns rewards. The agent’s behavior should aim to choose a sequence of actions that increases the long-run sum of rewards [37]. RL enables an agent to learn effective strategies in sequential decision-making problems by engaging in trial-and-error interactions with his or her environment [38].

As one of the most efficient artificial intelligence techniques, RL has undergone high levels of development in recent decades and has been applied to several health care domains. One application of the RL approach in the health care field is to support clinical decision-making, such as dynamic treatment regimes that develop a sequence of decision rules (including medication) for intervention to individualize treatments for patients based on their varying clinical characteristics and medical histories [39,40,41,42,43,44,45,46], automated clinical diagnoses that are driven by big data analysis to assist clinicians in more accurate and efficient decision-making [47,48,49,50,51], computer-assisted motor and cognitive rehabilitation therapies that aim to build optimal adaptive rehabilitation strategies that are tailored to the needs of specific patients [52,53,54], intelligent medical imaging [55,56,57], health care control systems for arm motion controllers, biomedical devices, chronic disease monitoring, anesthesia controllers, and so on [40, 58,59,60].

Compared to clinical decision-making, the study of applying RL in health care system operations management is still in its early days. Huang et al. [61] proposed a reinforcement learning-based resource allocation mechanism and used it to optimize the resource allocation in the radiology CT-scan examination process so that the process flow time is minimized. Schutz and Kolisch [62] modeled the capacity allocation problem for multiple classes of customers and multiple types of services by a continuous-time Markov decision process which was solved by RL. They used radiology services booking system to illustrate the proposed method. Gomes [63] proposed a framework embedded with an Advantage Actor-Critic reinforcement learning algorithm for scheduling primary care doctor daily slots appointment. Lee and Lee [64] used deep RL to assign emergency department patients who need different types of treatment to available medical resources with the objective of minimizing the patient waiting time. These literatures have shown that RL is an effective way to solve dynamic and stochastic hospital operation optimization problem in which the resource allocation decisions have to be based both on the current and future states of the system. This paper extends the application of RL in postpandemic surgery recovery management based on the optimization of dynamic and stochastic scheduling.

3 Methods

3.1 Reinforcement learning and Markov decision process

RL is a learning paradigm in sequential decision making problems. It has an agent to learn how to behave in a system, where the only feedback consists of a scalar reward signal, and perform actions that maximize the reward signal in the long run. MDP is an intuitive and fundamental formalism for learning problems in stochastic domains [65]. The canonical elements of an MDP include a set of system states \(\mathcal{S}\), a set of actions \(\mathcal{A}\), a transition probability matrix \(\mathcal{P}\) and a reward function [66]. A state \(s\in \mathcal{S}\) is a unique characterization to define the model. Actions \(a\in \mathcal{A}\) can be applied in some particular state to control the system. By applying action \(a\in \mathcal{A}\) in a state \(s\in \mathcal{S}\), the system transits from \(s\) to a new state \(s\mathrm{^{\prime}}\in \mathcal{S}\), based on the transition probability distribution \(P({s}^{\mathrm{^{\prime}}}|s,a)\) which is defined over possible next states. The reward function specifies rewards for being in a state \(R(s)\), performing an action in a state \(R(s,a)\), or transitions between states \(R(s,a,{s}^{\mathrm{^{\prime}}})\). The goal is to find an optimal rule of actions placed on states \(\pi (s,a)\), called policy, and gather maximal rewards. To estimate how good it is for the agent to be in a certain state or to perform a certain action in a state, value function \({V}^{\pi }(s)\) is defined to represent the expected return when starting in state \(s\) and following policy \(\pi\) thereafter, and state-action value function \({Q}^{\pi }(s,a)\) is defined to represent the expected return starting from state \(s\), taking action \(a\), and thereafter following policy \(\pi\).

RL is used for computing an optimal policy for an MDP with incomplete information, so that sampling and exploration are needed. The general algorithm for RL consists of interaction iterations in which the agent selects an action based on its current state, gets feedback based on the transient state and associated reward, and updates the estimated \(\widetilde{V}\) and \(\widetilde{Q}\). The selection of action at each step is based on the current system state \(s\) and value function \(V\) or \(Q\). The algorithm includes exploitation and exploration mechanism to gain more reward through exploiting current knowledge around good actions, and trying out different actions to explore the system for potential better actions. Through these iterations, RL collects and updates knowledge about the system while trying and searching for better actions interactively.

3.2 Model of elective surgery backlog management system

The surgery backlog problem is modeled by a general discrete-time queueing network system \(M\) that can be formulated by a countable-state MDP and targets that use the minimum amount of time to clear the elective surgery backlog while making sure that all surgeries, including those in the backlog and those for new arrivals, are scheduled before their due days. There are 2 queues in this system. The first queue represents the patient backlog that has accumulated since the onset of a pandemic like COVID-19, which is named Q0, while the second queue represents the patients who have newly arrived at the hospital, which is named Q1. The patients in these two queues are waiting for available surgery chances, and surgeries are scheduled daily. Therefore, the time slots are positioned on typical surgical days \(t=\mathrm{0,1},\mathrm{2,3},\dots\), where \(t=1\) denotes the first day after the hospital start to resolve its surgery backlog. Initially, the patients in the first queue represent those patients whose surgeries were delayed because of the COVID-19 pandemic. Let \({L}_{0}\) be the number of patients in Q0 at \(t=0\). Let \({\alpha }^{t}\) be the total number of new patients arriving at day \(t\). Assume that the new patients arrive in Q1 as an independent and identically distributed (i.i.d.) sequence \(\left\{{\alpha }^{t},t=\mathrm{0,1},\dots \right\}\) with a finite rate \({\lambda }_{1}={\mathbb{E}}\left({\alpha }^{t}\right)\). Each patient has an arrival day \({b}_{ij}\), which is defined as the day that the patient enters the queue, and a due day \({d}_{ij}\), which is defined as the day before which the patient has to undergo surgery, where \(i=\mathrm{0,1}\) indicates the queue and \(j=\mathrm{1,2},\dots\) denotes the patient. The time remaining from the current day to the patient’s due day represents the patient’s critical level; therefore, the larger the critical level is, the less urgent the patient’s surgery is. Let \(T\) be the time slot when all patients in Q0 are scheduled. Assume that the number of newly arrived patients in Q1 and the served patients at each time slot are both bounded. The system can be modeled as follows.

  1. (1)

    State space \(\mathcal{S}\): The system state is given by the lengths of the two queues, that is, a 2-dimensional queue backlog vector \({\varvec{S}}=\left({S}_{0},{S}_{1}\right)\), \({S}_{0}=\mathrm{0,1},\dots ,{L}_{0}\). Even though the number of newly arrived patients in Q1 at each time slot is finite, as the system changes over time, if the number of newly arrived patients in Q1 becomes larger than the number of scheduled patients in Q1 at each time slot, then the number of patients accumulated in Q1 could be infinite. Therefore, the system has an unbounded state space, which is denoted as \(\mathcal{S}={\mathcal{S}}_{0}\times {\mathcal{S}}_{1}=\left\{\mathrm{0,1},\dots ,{L}_{0}\right\}\times {\mathbb{N}}\). Let \({S}_{0}^{t}\) and \({S}_{1}^{t}\) be the number of waiting patients in Q0 and Q1 at day \(t\), respectively.

  2. (2)

    Action space \(\mathcal{A}\): The two queues compete for the services of the hospital; thus, the action is the number of patients scheduled from the two queues every day, \({\varvec{A}}=\left({A}_{0},{A}_{1}\right)\). There is a maximum capacity for each surgical day, which represents the maximum number of elective surgeries that can be performed by the hospital, e.g., \({L}_{\mathrm{max}}\). Therefore, \({A}_{o}+{A}_{1}\le {L}_{\mathrm{max}}\), and the action space is \(\mathcal{A}=\left\{\mathrm{0,1},\dots ,\mathrm{min}\left\{{L}_{0},{L}_{\mathrm{max}}\right\}\right\}\times \left\{\mathrm{0,1},\dots ,{L}_{\mathrm{max}}\right\}\). Let \({{\varvec{A}}}^{t}=\left({A}_{0}^{t},{A}_{1}^{t}\right)\) be the action taken at day \(t\).

  3. (3)

    State-transition probability \(P\): Since the system dynamics satisfy the Markov property, the probability of the system state transitioning into a state \({\varvec{S}}\boldsymbol{^{\prime}}\) only depends on the current state \({\varvec{S}}\), the selected action \({\varvec{a}}\) and the number of newly arrived patients, \(p({{\varvec{S}}}^{\boldsymbol{^{\prime}}}|{\varvec{S}},a)\). Each surgical day, the hospitals schedule surgeries for \({A}_{1}\) patients from the Q0 queue and \({A}_{2}\) patients from Q1, and there are \(\alpha\) newly arrived patients that enter the queue Q1:

    $$P\left({\boldsymbol{S}}^{\prime }=\left({s}_0^{\prime },{s}_1^{\prime}\right)|\boldsymbol{S}=\left({s}_0,{s}_1\right),\boldsymbol{a}=\left({a}_0,{a}_1\right)\right)=\left\{\begin{array}{c}1\\ {}0\end{array}\right.\kern1.25em {\displaystyle \begin{array}{l}\textrm{if}\ {s}_0^{\prime }={s}_0-{a}_0\ \textrm{and}\ {s}_1^{\prime }={s}_1-{a}_1+\alpha \\ {}\textrm{otherwise}\end{array}}$$

    where \(\alpha\) follows the distribution of the number of newly arrived patients, which could be obtained from the historical surgery scheduling data or the experts’ opinion.

  4. (4)

    Cost \(\mathcal{C}\): The objective for this problem is to schedule all the patients in Q0 as soon as possible while making sure that no patient misses his or her due day and minimizing the average backlog of Q1. Let \({x}_{ij}^{t}\) be the indicator of each patient’s scheduled surgery time:

    $$x_{ij}^t=\left\{\begin{array}{c}1\\0\end{array}\right.\begin{array}{l}\mathrm{if}\;\mathrm{patient}\;j\;\mathrm{in}\;Qi\;\mathrm{is}\;\mathrm{scheduled}\;\mathrm{at}\;\mathrm{time}\;t\\\mathrm{otherwise}\end{array},i=0,1,j=1,2,\dots,t=0,1,\dots$$

The objectives can be summarized by the following cost function:

$${C}_{t}=\frac{\sum_{i=0}^{1}\sum_{j}\left(1-\sum_{\tau =1}^{t-1}{x}_{ij}^{\tau }\right)\left(t-{b}_{ij}\right)}{\sum_{i=0}^{1}\sum_{j}\left(1-\sum_{\tau =1}^{t-1}{x}_{ij}^{\tau }\right)}-\frac{\sum_{i=0}^{1}\sum_{j}\left(1-\sum_{\tau =1}^{t}{x}_{ij}^{\tau }\right)\left({d}_{ij}-t\right)}{\sum_{i=0}^{1}\sum_{j}\left(1-\sum_{\tau =1}^{t}{x}_{ij}^{\tau }\right)}+\sum_{i=0}^{1}\sum_{j}{x}_{ij}^{t}M\left[1-H\left({d}_{ij}-{x}_{ij}^{t}t\right)\right]$$
(1)

where \(H(x)\) is the Heaviside step function whose value is zero for negative arguments and one for nonnegative arguments, \(H(x)\triangleq \left\{\begin{array}{c}1\\ 0\end{array}\right. \begin{array}{l}\mathrm{for} x\ge 0\\ \mathrm{for} x<0\end{array}\). At time \(t\), an action \({{\varvec{A}}}_{t}\) is selected, and the system has three types of patients: (1) patients scheduled at time \(t\), (2) patients who have not been scheduled up to time \(t\), and (3) patients who have been scheduled before time \(t\). The first part of Eq. (1) represents the average waiting time of the patients who are scheduled at time \(t\) or have not been scheduled up to time \(t\). The second part of Eq. (1) represents the average critical level of the patients who have not been scheduled up to time \(t\), where the critical level is represented by the remaining number of days before the due day for each unscheduled surgery. The smaller the critical level is, the closer the surgery is to its due day and the more urgent the surgery is. The third part of Eq. (1) is the penalty term for patients who are scheduled at time \(t\) but have already missed their due day. \(M\) is a large positive penalty constant; even one overdue surgery will result in a large cost function so that the algorithm is able to prevent the occurrence of delaying a treatment. It should be noted that the cost could be negative if surgeries are scheduled long before their due days.

4 Algorithm

Since the problem has unbounded state spaces, we construct the surgery backlog scheduling algorithm based on the piecewise decaying \(\epsilon\)-greedy reinforcement learning (PDGRL) algorithm proposed by Liu et al. [67] to find the optimal policy \({\pi }^{*}\). This algorithm introduces an auxiliary system \(\widetilde{M}\) with bounded state space, in which each queue has buffer size \(U\). In \(\widetilde{M}\), for each queue, when the queue backlog reaches \(U\), new arrivals are dropped. \(\widetilde{M}\) has the same action space and cost function as \(M\). In this problem, \(U\) is selected to be greater than \({L}_{0}\), \(U>{L}_{0}\) to ensure that all patients in Q0 will be scheduled in \(\widetilde{M}\). The state space of \(\widetilde{M}\) is \(\widetilde{\mathcal{S}}\triangleq \left\{{\varvec{S}}\in \mathcal{S}:{S}_{0},{S}_{1}\le U,U>{L}_{0}\right\}\). By exploring and exploiting the solutions within the auxiliary system while implementing the stabilizing policy outside the auxiliary system, the PDGRL algorithm provides a control policy that is close to the optimal result with a large \(U\).

4.1 Stabilizing policy

The stabilizing policy is a policy under which the system is stable, that is, the system converges in the sense that:

$$\underset{t\to \infty }{\mathrm{lim}}P\left[{\varvec{S}}\left(t\right)\le {\varvec{r}}\right]=F\left({\varvec{r}}\right), \forall {\varvec{r}}\in \mathcal{S}$$

where \(F(\bullet )\) is the cumulative distribution function on \(\mathcal{S}\). According to Foss et al. [68], for a two-component Markov chain \(\left\{\left({X}^{t},{Y}^{t}\right)\right\}\) on the state space \(\left(\mathcal{X},\mathcal{Y}\right)\), where \(\left\{{X}^{t}\right\}\) forms a Markov chain itself with a stationary distribution \({\pi }_{X}\), the system is stable if the following conditions hold:

A. The expectations of the absolute values of the sequence \(\left\{{L}_{2}\left({Y}^{t}\right)\right\}\) are bounded from above by a constant \(U\):

$$\underset{x\in \mathcal{X},y\in \mathcal{Y}}{\mathrm{sup}}{\mathbb{E}}_{x,y}\left|{L}_{2}\left({Y}^{t+1}\right)-{L}_{2}\left({Y}^{t}\right)\right|\le U<\infty$$

where \({L}_{2}\left(\bullet \right)\) is a nonnegative measurable function that is defined on a sigma-algebra \(\widetilde{\mathcal{Y}}\) of \(\mathcal{Y}\).

B. There exists a nonnegative and nonincreasing function \(h\left(N\right),N\ge 0\) such that \(h(N)\downarrow 0\) as \(N\to \infty\), and a family of mutually independent random variables \(\left\{{\varphi }_{x}^{t}\right\},x\in \mathcal{X},t=\mathrm{0,1},\dots\) such that

B1. for each \(t\), \(\left\{{\varphi }_{x}^{t}\right\},x\in \mathcal{X},t=\mathrm{0,1},\dots\) are uniformly integrable;

B2. for each \(t\) and \(x\), \(\left\{{\varphi }_{x}^{t},t=\mathrm{0,1},\dots \right\}\) are identically distributed with common distribution function \({F}_{x}\), which is such that \({F}_{x}(y)\) is measurable as a function of \(x\) for any fixed \(y\);

B3. the following inequality:

$${L}_{2}\left({Y}^{t+1}\right)-{L}_{2}\left({Y}^{t}\right)\le {\varphi }_{{X}^{t}}^{t}+h\left({L}_{2}\left({Y}^{t}\right)\right)$$

holds for all \(x\in \mathcal{X},y\in \mathcal{Y}\) and \(t=\mathrm{0,1},\dots\); and

B4. functions \(f\left(x\right)={\mathbb{E}}{\varphi }_{x}^{1}\) satisfy the following:

$$\int_{\mathcal X}\;f(x)\pi_Xdx=-\varepsilon<0.$$

Since the objective of the surgery scheduling system is to schedule all patients in both queues as soon as possible, it is obvious that \({L}_{\mathrm{max}}\) surgeries will be scheduled every day as long as the total number of patients waiting in the system exceeds \({L}_{\mathrm{max}}\). Therefore, to simplify the proof of the stabilizing policy, the working hours on each typical surgical day \(t\) are divided into time slots \({t}^{^{\prime}}={L}_{\mathrm{max}}\left(t-1\right)+1,{L}_{\mathrm{max}}\left(t-1\right)+2,\dots ,{L}_{\mathrm{max}}t\) and \(t=\mathrm{1,2},\dots\). The new patient arrival rate at \({t}^{^{\prime}}\) of Q1 is \(\lambda {\prime}_1=\mathbb{E}\left({\alpha}^{t^{\prime }}\right)=\mathbb{E}\left(\frac{\alpha^t}{L_{\textrm{max}}}\right)\), where \({\alpha }^{{t}^{^{\prime}}}\) is the total number of new patients who arrive at \({t}^{^{\prime}}\). At each time \({t}^{^{\prime}}\), one surgery, either from Q0 or from Q1, will be scheduled. Thus, the action at \({t}^{^{\prime}}\) becomes the index of the queue being selected, \({A}^{\prime {t}^{\prime }}\in \left\{0,1\right\}\), and the action taken at day \(t\) is \({\boldsymbol{A}}^t=\left({A}_0^t,{A}_1^t\right)=\left(\sum_{t^{\prime }={L}_{\textrm{max}}\left(t-1\right)+1}^{L_{\textrm{max}}t}\left(1-{A}^{\prime {t}^{\prime }}\right),\sum_{t^{\prime }={L}_{\textrm{max}}\left(t-1\right)+1}^{L_{\textrm{max}}t}{A}^{\prime {t}^{\prime }}\right)\)

Consider the longest connected queue (LCQ) policy, which schedules the patient from the queue with maximum length at time slot \({t}^{^{\prime}}\) as follows:

$${A}^{\prime {t}^{\prime }}=\left\{\begin{array}{l}\textrm{none}\kern6.75em \textrm{if}\ {S}_i^{t^{\prime }}=0,i=0,1\\ {}\underset{i=0,1}{\arg}\max \left\{{S}_i^{t^{\prime }}\right\}\kern2.5em \textrm{otherwise}\end{array}\right.$$

This is equivalent to the following:

$${{\varvec{A}}}^{t}=\left({A}_{0}^{t},{A}_{1}^{t}\right)=\left\{\begin{array}{l}\left({L}_{\mathrm{max}},0\right) \mathrm{if} {S}_{1}^{t}+{L}_{\mathrm{max}}\le {S}_{0}^{t}\\ \left(\frac{{L}_{\mathrm{max}}+\left({S}_{0}^{t}-{S}_{1}^{t}\right)}{2},\frac{{L}_{\mathrm{max}}-\left({S}_{0}^{t}-{S}_{1}^{t}\right)}{2}\right) \mathrm{if} {S}_{1}^{t}\le {S}_{0}^{t}<{S}_{1}^{t}+{L}_{\mathrm{max}}\\ \left(\frac{{L}_{\mathrm{max}}-\left({S}_{1}^{t}-{S}_{0}^{t}\right)}{2},\frac{{L}_{\mathrm{max}}+\left({S}_{1}^{t}-{S}_{0}^{t}\right)}{2}\right) \mathrm{if} {S}_{0}^{t}\le {S}_{1}^{t}<{S}_{0}^{t}+{L}_{\mathrm{max}}\\ \left(0,{L}_{\mathrm{max}}\right) \mathrm{if} {S}_{0}^{t}+{L}_{\mathrm{max}}\le {S}_{1}^{t}\end{array}\right.$$

Hence, proving the stability of \(\left\{\left({S}_{0}^{t},{S}_{1}^{t}\right)\right\}\) is sufficient for proving the stability of \(\left\{\left({S}_{0}^{{t}^{^{\prime}}},{S}_{1}^{{t}^{^{\prime}}}\right)\right\}\).

Theorem 1. System \(\left\{\left({S}_{0}^{{t}^{^{\prime}}},{S}_{1}^{{t}^{^{\prime}}}\right)\right\}\) is stable under the LCQ policy if \(\lambda {\prime}_1=\mathbb{E}\left({\alpha}^{t^{\prime }}\right)\le 1\) and \({L}_{0}>2\) .

The proof of Theorem 1 is provided in the Appendix.

Theorem 1 shows that the surgery backlog system with two queues that represent the patient backlog and new arrivals, respectively, is stable under the policy of minimizing the difference between the number of unscheduled surgeries in the two queues when the mean new patient arrival rate is no more than the mean surgery process rate.

In the next section, the algorithm for surgery backlog scheduling is proposed, in which Theorem 1 is used to define the stabilizing policy as part of policy computation in each episode.

4.2 Elective surgery backlog PDGRL algorithm

The algorithm operates in an episodic manner. In each episode, it either performs exploration or exploitation. In the exploration stage, the algorithm applies a random policy \({\pi }_{\mathrm{rand}}\), which selects an action in \(\mathcal{A}\) uniformly, to each state in \(\widetilde{\mathcal{S}}\) and applies the stabilizing policy \({\pi }_{\mathrm{stable}}\) to each state in \(\mathcal{S}\backslash \widetilde{\mathcal{S}}\). In the exploitation stage, the reward matrix \(R\) is estimated by the calculations made in the previous episodes and an MDP consisting of transition matrix \(P\), after which an updated reward matrix \(R\) is solved, and the optimal policy \({\pi }^{*}\) is obtained. The algorithm applies \({\pi }^{*}\) to each state in \({\mathcal{S}}^{i\mathrm{n}}\triangleq \left\{{\varvec{S}}\in \mathcal{S}:{S}_{0}+{S}_{1}\le U-{L}_{\mathrm{max}}\right\}\) and applies \({\pi }_{\mathrm{stable}}\) to each state in \(\mathcal{S}\backslash {\mathcal{S}}^{i\mathrm{n}}\). The detailed algorithm is shown as follows.

figure a
figure b

5 Computational experiments

5.1 Experiments on simulated datasets

The model and algorithm proposed in the previous sections are first tested on a series of simulated datasets, to verify its effectiveness and study the robustness of its performance. The main parameters of the problem are as follows:

  1. 1.

    Total number of backlogged elective surgeries by the time when hospital starts to recover its surgical services, \({L}_{0}\): The larger the hospital is, or the longer the peak weeks in one wave of pandemic, the larger \({L}_{0}\) is. A larger \({L}_{0}\) means that it will take a longer time to clear the backlog, and the probability of patients missing their due days will be higher.

  2. 2.

    Number of elective surgeries that can be done every day from the hospital starts to recover its surgical services until the hospital fully recovers its capacity, \({L}_{\mathrm{max}}\): The larger \({L}_{\mathrm{max}}\) is, the faster the backlog will be cleared.

  3. 3.

    Proportions of patients with different critical levels. In this research, based on the levels of clinical severity, the surgeries are classified into three types: due in 30 days, due in 60 days, and due in 90 days. If there are a large proportion of surgeries with high critical level being delayed during the pandemic, the system may need to schedule more surgeries from the backlog queue to reduce the risk of increased disease progression and poor health outcomes. However, the chance of patients missing their due days will still be higher if the proportion of patients with severe illness is increased.

  4. 4.

    Earliest original scheduled day of awaiting patients in the backlog queue: If the hospital stops or cuts its elective surgical services for a long time, the number of patients who have been waiting in the backlog queue passing their due days will be large. Even some patients with low critical levels will be close to their due days at the beginning of the hospital’s postpandemic recovery. Thus, there will be many patients who need to be scheduled immediately after the recovery plan starts, which may exceed the service capacity, and cause that more patients cannot be scheduled before their due days.

  5. 5.

    Mean of newly arrived patients every day after the hospital starts to recover its surgical services: The proposed algorithm is designed to clear the backlog as soon as possible, while minimizing the number of surgeries scheduled after the due day. This procedure considers the critical level of surgeries both in the backlog queue and in the new arrivals queue. Therefore, the rate of new arrivals in queue Q1 will affect the time needed to clear the backlog. If the rate of new arrivals is extremely high, more patients with high critical levels need to be included in the schedule and compete for the limited service capacity with the backlogged patients.

Table 1 summarizes the setup of simulation experiments for evaluating the performance of the proposed method.

Table 1 Simulation experiments setup

Each setup was run for 20 replications, and each replication was run for 1000 episodes to test the convergence. The statistics of the number of days to clear the backlog, the number of patients who miss their due day, and running time are summarized in Table 2.

Table 2 Comparison of simulation experiments

The experiments were conducted with \(U=50,l=0.5\), and \(M=1000\). These simulation setups include some extreme cases. Take the first experiment as an example. Compared to the number of surgeries that can be done each day, the size of the backlog is very large. Meanwhile, the proportions of surgeries with high critical levels are very large too. Therefore, in the simulated \(Q0\) queue, there are more surgeries due on each day than the daily maximum surgical service capacity. For such cases, the optimal scheduling policy would be to schedule all surgeries in both \(Q0\) queue and \(Q1\) queue based on their due days, even though the cases of missing due days would happen from the first day to the last day when all surgeries in \(Q0\) queue have been cleared. Figure 1 gives the episode cost function of the first experiment to illustrate the convergence of the algorithm.

Fig. 1
figure 1

Episode cost function of simulation experiment 1 in Table 1

Based on the simulation experiments, it can be seen that the proposed elective surgeries backlog PDGRL algorithm is able to solve different problems with various settings of parameters, and can converge to the optimal solution within 80 ~ 150 episodes. For extreme cases in which the size of the backlog is relatively large compared with the daily capacity, and the proportions of high critical level patients are large, the algorithm can give solutions close to the optimal scheduling. For normal cases, the algorithm can give solutions with very few surgeries missing the due day. However, the algorithm running time is sensitive to the size of the problem. As the size of the backlog and daily service capacity increase, the computation speed decreases significantly because of the slower MDP solving for large state space and action space.

5.2 Numerical example

W Hospital, which is located in Western China, is one of the largest hospitals nationwide. During the peak weeks of the COVID-19 outbreak in China, elective surgery operations in W Hospital were delayed from January 27 to March 27, 2020, and a total of 16,377 elective surgeries were awaiting by the end of March, which is when the hospital started to use its medical partnerships to recover its surgical services. Let \(t=0\) be the start time when the hospital started to clear the surgical backlog. Thus, \(t=-1.-2,\dots\) denote one day, two days, and so on before the start time, and \(t=\mathrm{1,2},\dots\) denote one day, two days, and so on after the start time. The information pertaining to each surgery includes the arrival day, due day, and original scheduled day, as shown in Table 3. While the surgical backlog was cleared, there were newly arrived patients who needed elective surgeries every day. Based on the historical data, the number of newly arrived patients every day follows a Poisson distribution with a mean of 300, in which 19.3%, 39.1%, and 41.6% of these patients needed their surgeries to be conducted within 30 days, 60 days, and 90 days, respectively.

Table 3 Information of surgical backlog (Q0)

The 16,377 elective surgeries form queue Q0, and the newly arrived surgeries form queue Q1. The algorithm proposed in the previous section is applied to reschedule the surgeries in Q0 and to schedule the newly arrived surgeries in Q1; the overall objective is to clear the Q0 backlog as soon as possible while ensuring that the number of patients who miss their due day is minimized. The capacities of W Hospital and its branch health center locations determine that the number of elective surgeries that can be performed each day is 390 on average.

The benchmark policies to compare with are the first-come first-served (FCFS), the earliest due date (EDD), and the real policy that had been implemented in W Hospital during their surgical service recovery after the peak weeks of the COVID-19 pandemic, which was done ad hoc. The FCFS policy schedules the surgeries according to the sequence patients arrive. The EDD policy schedules the surgeries according to the due dates of surgeries. The real ad hoc policy used by W hospital scheduled those backlogged surgeries first based on the original scheduled arrangement, and didn’t start to schedule newly arrived patients until all backlogged surgeries have been cleared. Approximately 9.72% of the backlogged patients was worsened precipitously, so that they left the waiting queue and took emergency operations.

First, the proposed method is implemented with a priority placed on following the original scheduled arrangement. Second, the proposed method is implemented with a priority placed on the due days, that is, the surgeries that are closer to their due days are scheduled first. The algorithm was conducted with \(U=50,l=0.5\), \(M=1000\), and \(K=200\). Table 4 shows the comparison of real data, real ad hoc policy used by W Hospital, FCFS policy, EDD policy, the optimal policy obtained by the elective surgeries backlog PDGRL algorithm with a priority placed on following the original scheduled arrangement, and the optimal policy obtained by the elective surgeries backlog PDGRL algorithm with a priority placed on due days.

Table 4 Comparison of different surgery backlog clearing strategies

The real data was collected from March 30, 2020 to May 25, 2020 when the backlog had been cleared, to show the real situation that happened after the pandemic outbreak. According to the head nurse who was in charge of elective surgery scheduling, the main reasons for the difference between real data and read ad hoc policy used are: (1) Some doctors were not available at the beginning of the service recovery phase, which delayed the surgeries they were in charge of. (2) Some patients left the waiting list. They might go to hospitals near their place of residence to receive faster treatments. This caused the number of patients in the \(Q0\) queue to be less than 16,377.

From Table 4, it can be seen that the optimal policy obtained by the proposed elective surgeries backlog PDGRL algorithm with a priority placed on due days could clear the backlogged surgeries in a shorter time while minimizing the number of patients missing their due days.

Unlike Q0, in which all the surgeries waiting are known when the hospitals start to clear their backlog and no newly arrived surgeries are entering this queue, the surgeries in Q1 are newly arrived and unknown at the beginning of the clearing process. Therefore, scheduling decisions are dynamically made every day based on the surgeries remaining unscheduled in Q0 and the number of surgeries waiting in Q1 on the previous day. To examine the performance of the proposed method, 100 Q1 cases are simulated, and the obtained optimal strategy is implemented for each case. Based on the comparison shown in the Table 4, the elective surgeries backlog PDGRL algorithm with a priority placed on due days for the randomly selected cases outperforms the algorithm with a priority placed on following the original scheduled arrangement. Figures 2 and 3 show the optimal scheduling strategy obtained by the elective surgeries backlog PDGRL algorithm with a priority placed on due days for the randomly selected cases.

Fig. 2
figure 2

Optimal policy: number of surgeries scheduled every day for Q0 and Q1 from the randomly selected cases

Fig. 3
figure 3

Optimal policy: timeline of 100 randomly selected surgeries from Q0

Figure 2 plots the number of surgeries scheduled every day for Q0 and Q1. Figure 3 plots the timeline of the 100 randomly selected surgeries from Q0, where the arrival day of each surgery is marked by a gray number, the due day of each surgery is marked by an orange number, the original scheduled day of each surgery is marked by a green number, and the rescheduled day of each surgery is marked by a blue number. If there is any surgery that has missed its due day, its rescheduled day is marked in red. By summarizing the 100 simulated cases, it is found that the mean time that it will take W Hospital to clear its elective surgery backlog is 51.72 days (95% confidence interval: (51.01, 53.47)), and the average number of surgeries (both in Q0 and Q1) that are not scheduled before their due day is 0.02 (95% interval: (0, 0.048)). The histograms of performance measures of the 100 simulated cases are shown in Fig. 4.

Fig. 4
figure 4

Histogram of (a) number of days to clear the backlog \(Q0\), and (b) number of surgeries missing due days for the optimal policy with 100 \(Q1\) simulated cases

5.3 Summary and discussion

Through simulation experiments and real data numerical example, the proposed elective surgeries backlog PDGRL algorithm has shown good potential of tackling surgery backlog caused by a pandemic such as COVID-19. In the simulation study, extreme cases in which the number of backlogged surgeries closing to due days is too large was tested, and the results show that the proposed algorithm can quickly converge to the optimal solution. The performance of the algorithm is not sensitive to the parameters of the problem, except that the running time is greatly affected by the size of the problem.

Compared to the ad hoc procedure which was used in the hospital, the proposed elective surgeries backlog PDGRL algorithm could improve the after-pandemic surgery backlog scheduling performance by reducing the average waiting time of patients as well as the risk of missing the best treatment time. The ad hoc procedures are designed to schedule the backlog first following the original scheduled arrangement, without considering the newly arrived patients and the critical levels of patients. This leads to a short backlog clearance time, but a large number of patients who miss their best treatment opportunity, and an increasing waiting time for all patients in the system. The proposed algorithm could find an optimal balance between these factors. However, the difference between the calculated policy with the real data indicates that more realistic factors such as the patients’ behavior, the after-pandemic recovery plans for different hospitals, and the doctors’ preference, need to be included in the algorithm in the future to create an accurate surgery backlog management system.

The most serious problem with the proposed elective surgeries backlog PDGRL algorithm is that its computation speed is highly affected by the size of the backlog and the maximum number of surgeries that can be done every day, since the increase in these two parameters will significantly increase the volume of state space \(\mathcal{S}\) and action space \(\mathcal{A}\), which in turn makes the search space exponentially growing. For a problem with 10,000 backlogged surgeries to be cleared, it may take more than a day to run the algorithm for 1000 episodes, compared to five hours for the problem with 6000 backlogged surgeries. This would limit the application of this algorithm when fast decisions are required, which needs future work to improve the computation speed.

6 Conclusion

A large number of elective surgeries are postponed due to disruptions caused by pandemics, such as COVID-19, which results in serious backlogs. Continuing to delay these surgeries could result in disease progression and poor health outcomes for patients, as well as financial losses for health care systems and nations. This paper presented a stochastic control process-based method that aims to help large hospitals make operational recovery plans to clear their elective surgery backlog and restore surgical activity safely. The proposed solution uses the MDP model and reinforcement learning algorithm and shows effectiveness in managing elective surgeries during pandemics. It can be adapted to a hospital’s decision support system using local data to assist with health care system recovery planning and help hospitals prepare for future pandemic waves. For future work, the algorithm could be extended to allow dynamic changes in hospital capacities so that the stochastic scheduling optimization can realize the real-time management of elective surgeries during public health emergencies, include more factors in the algorithm such as doctors’ preferences and surgery due date windows to make the system more accurate and practical, and consider the patients’ choices in case that some patients are not available on their scheduled days.