Introduction

In recent years, the logistics industry has experienced rapid growth with the emergence of e-commerce [1]. As a new method of sorting, the Automatic Guided Vehicle (AGV) sorting system has received considerable attention in the logistics industry due to its high flexibility, cost-effectiveness, and strong robustness [2]. In the AGV sorting system, AGVs transport express packages from loading points to goal points and then leave the sorting area. Hence, planning short paths for multiple successive sorting tasks (i.e., solving dynamic path planning problems) is crucial for enhancing the efficiency of the AGV sorting system. However, this problem presents considerable challenges. Firstly, the path planning problem is NP-hard, making it impossible to obtain an optimal solution within a limited time [3]. Secondly, tasks are dynamic and continuously added, leading to frequent failures of existing paths. Lastly, each task involves a series of target points that must be visited sequentially, rather than a single point [4], and traditional path planning methods are incapable of planning paths for a series of target points in a single process.

Traditional path planning methods are unable to continuously plan paths for newly added tasks, hence the window strategy is applied in dynamic path planning problems. Specifically, the dynamic path planning problem is divided into multiple consecutive static (one-shot) path planning problems. These problems are subsequently resolved using the chosen one-shot planning method. In each replanning period, there are two stages: first, the solving stage, where the server plans complete paths for tasks appearing in that period; and then the moving stage, where AGVs move according to the planned paths. To find the optimal path, the computing time is often lengthy, which is undesirable for an online, continuously operating system. Although the computing times of current path planning algorithms are becoming faster, they still increase to unsustainable levels with the growth of problem size. To address this, some scholars have proposed using a rolling window strategy [4,5,6,7], which advances the AGV moving stage to overlap with the solving stage of the previous period (processing while moving), to utilize moving time for improved path optimization. However, these approaches do not take into account the degree of match between the operation time of the physical system and the computing time of the algorithm. During this process, the mismatch between the moving time and the computing time can reduce system efficiency. Currently, there is no particularly effective method for adjusting the system to ensure a match between these two types of time.

Given the information presented earlier, the fundamental problem is to strategize path planning for AGVs in a manner that coordinates system processing times with AGV movement durations. The objective is to enhance path efficiency within the constraints of available computational capabilities, in turn elevating the efficiency of the entire sorting system. In this study, we propose an Optimal Time Reuse strategy-based Dynamic multi-AGV Path planning method (OTRDP) to overcome the limitations of the existing dynamic path planning methods. The central idea of our method is to match the computing time of the algorithm with the actual moving time of the physical system, thereby optimizing the path as much as possible while avoiding waiting due to overtime computation. The primary contributions of this research can be summarized as follows:

  • On a global level, we propose a replanning period setting algorithm. Its aim is to keep the number of AGVs in the physical system within a certain range, thereby aligning the required computing time with the solving speed to a roughly matching extent. The algorithm selects the most appropriate replanning period by combining theoretical analysis of task completion rates with a model fitting the problem-solving capability of the server.

  • On the individual planning level, we propose an algorithm for setting the planned path length within a single period. The goal here is to match each computing time with the predetermined AGV moving time in individual planning instances, optimizing the path to the greatest extent while avoiding stops and waits. The algorithm achieves this by controlling the planned path length, inversely controlling the computing time based on a computing time fitting model.

  • Finally, we propose a temporary target selection algorithm based on the characteristics of the grid topology map to enhance the universality of the method. The algorithm selects temporary target points in accordance with the planned path length within a single period. It eliminates the restrictions where some one-shot path planning methods cannot directly plan paths based on planned path length.

The remainder of this paper is organized as follows. In “Related work” Section reviews the related work. The system model and problem formulation are described in “System model and problem formulation” Section. In “OTRDP method” Section introduces the framework and the three subalgorithms we propose in detail. In “Performance evaluation” Section provides the simulation results and analysis. In “Conclusion” Section draws conclusions.

Related work

This section will review the static and dynamic path planning methods.

Static path planning methods

Static (one-shot) path planning methods are the basis of dynamic methods. As time progresses and development continues, a diverse array of path planning methods has become available:

Reduction-based methods [3, 8,9,10,11] transform the path planning problem into well-solved mathematical programming problems. For example, Yu et al. [3] and Surynek et al. [8] use the multi-flow network and Boolean satisfiability (SAT), respectively, to solve this problem. Furthermore, Bartak et al. [9] improve the efficiency of SAT to find the Sum of Costs optimal solutions to MAPF, Surynek et al. [10] introduce sparse decision diagrams to accelerate the solving speed of SAT, and Acha et al. [11] propose a new Boolean encoding for MAPF, which is more efficient and scalable. Although these methods have the capability to obtain optimal solutions, they require a significant amount of computation time.

Rule-based methods [12,13,14] focus on rapidly finding a feasible solution for the path planning problem. However, there are few optimization strategies for achieving path optimality, so the obtained solutions are often far from optimal paths.

Search-based methods (A* [15, 16], Conflict-Based Search (CBS) [17,18,19,20,21,22,23,24,25,26], M* [27], Operator Decomposition + Independence Detection (OD+ID) [15, 28]) and swarm intelligence methods [29, 30] trade off the speed of solving and path optimality. Relatively good solutions can be found in a short time by search-based methods. In search-based methods, CBS is the most widely used due to its excellent dual-layer structure, which facilitates the introduction of various strategies for specific optimizations in addressing different problems. Specifically, the high-level searches for a solution in the space of constraints on individual agents’ paths, and the low-level searches for paths for individual agents that satisfy these constraints [17]. Ma et al. [21] introduce a priority mechanism in the CBS method, resulting in fewer branches and a faster solving process. Similarly, Li et al. [22] also propose a CBS-based method employing a large neighborhood search approach to efficiently solve MAPF efficiently. In addition, Zhang et al. [23] generalize mutex propagation and symmetry-breaking techniques to MAPF with large agents. Ren et al. [25] propose the Conflict-Based Steiner Search (CBSS), a novel framework that combines CBS with multiple traveling salesman algorithms for solving Multi-Agent Combinatorial Path Finding (MCPF). Additionally, Zhang et al. developed a priority-based hierarchical framework for k-robust multi-agent pathfinding, providing a structured approach for managing paths of agents with varying priorities [26].

Recently, learning-based methods [31,32,33] have emerged as a novel approach in path planning, incorporating machine learning principles, particularly reinforcement learning and deep learning. Damani et al. explored PRIMAL_2, a pathfinding approach that combines reinforcement and imitation learning for multi-agent systems, demonstrating capabilities for continual learning in complex settings [31]. Sinkar et al. investigated the use of distributed deep learning models for multi-agent pathfinding, showing scalability and adaptability for management of multiple agents [32]. Chen et al. presented a method that combines imitation-reinforcement learning with transformers for multi-agent pathfinding, signifying advancements in learning efficiency and path optimization [33].

Table 1 Summary of static path planning methods

Dynamic path planning methods

Fig. 1
figure 1

Dynamic path planning model of the first type of methods

Dynamic path planning methods are frameworks that generally choose one or more one-shot methods as the foundational algorithm. Existing dynamic path planning methods can be divided into 3 distinct categories. In addition, some path planning methods plan complete paths from starting points to goal points, however, they resolve conflicts only in the first part of the path to save the computing time. In fact, only this conflict-free path within a single period is executable; therefore, for a more precise expression, we refer to the planned path length within a single period as the conflict-free path length.

As shown in Fig. 1, the first type of dynamic planning method [2, 36, 37] breaks down the dynamic problem into a series of successive static problems. Each static problem is addressed within a replanning period that includes a solving stage and an AGV moving stage. The blue arrow indicates the generation of a new task. During the solving stage, the server merges the newly generated tasks and the tasks currently being executed to form a new set and finds paths for this new task set. During the AGV moving stage, all AGVs move according to the paths from the server. This method, however, has certain limitations. Firstly, the AGVs in the sorting area must halt and await the solving stage when a new task is added, as there are no unified paths for all AGVs. Secondly, this method necessitates complete paths for each replanning period, even if these paths will not be fully utilized later, resulting in a squandering of computational resources. Finally, the problem of needing to visit a series of target points remains unresolved. A task with a series of target points must undergo separate replanning at each target point, which inevitably leads to extended waiting times.

The second type of dynamic planning method is designed to avoid the computational inefficiencies and wastage resulting from the repeated replanning of all paths. Wan et al. [38] adopt incremental planning to reduce the calculation by reusing existing paths to formulate new ones. Specifically, CBS was utilized as the foundational method to construct the search tree in their study. When a new task emerges, the path for the new task is added into the search tree. Then, conflicting paths are adjusted to generate a new search tree without conflicts. The conflict-free paths remain unaltered. Similarly, Svancara et al. [39] utilize the ID method to group conflicting tasks together for replanning. These methods avert the waste of computational resources caused by replanning for all AGVs in each replanning period. However, scalability remains inadequate when there are excessive AGVs in the path planning area, as the multitude of conflicts arising from dense AGVs necessitates more adjustments to achieve a feasible solution.

Fig. 2
figure 2

Dynamic path planning model of the third type of methods

In the third type of method [4,5,6,7], the solving stage is shifted to the previous replanning period, as shown in Fig. 2. This adjustment can significantly reduce waiting times during the solving stage. For example, Li et al. [4] make further enhancements and propose the Rolling-Horizon Collision Resolution (RHCR) method, which resolves conflicts among the paths of AGVs only within a bounded time horizon, while disregarding conflicts beyond it. In addition, Wang et al. [5,6,7] employ a similar mechanism, Receding Horizon Control (RHC) theory, to address the online path planning problem in the towed carrier aircraft system. However, these methods only utilize preset window lengths and do not adjust the replanning period or conflict-free path length according to the actual situation. Unsuitable values may lead to diminished path optimality, increased AGV conflicts, or a delay in the task completion time. Furthermore, these factors could lead to a decrease in the task completion rate.

Table 2 Comparison of different dynamic path planning strategies

The characteristics of existing dynamic path planning strategies are summarized in Table 2. In summary, existing methods employ preset or fixed replanning periods and conflict-free path lengths. These methods are unable to make suitable adjustments to accommodate the continuously evolving computational demands of dynamic path planning problems. Unlike the existing methods, this study considers the impact of the above two parameters on the task completion rate and the implicit relationship between them, and dynamic parameters are adopted to solve this problem.

System model and problem formulation

System model

Fig. 3
figure 3

Layout of the AGV sorting system

The layout of the AGV sorting system is shown in Fig. 3. The sorting area, situated centrally, comprises the sorting field and the loading field. The waiting area flanks both sides of the sorting area. Quick Response (QR) codes are distributed on the floor for AGV localization. The sorters, responsible for loading express packages onto the AGVs, are located in the loading field, while the sorting panes, which collect express packages destined for the same city, are situated in the sorting field. In the AGV sorting system, the unloaded AGVs initially move from the waiting area to the loading area to receive the packages from the sorters. Subsequently, the loaded AGVs transport the packages to the corresponding sorting panes following the paths planned by the system. Finally, the empty AGVs leave the sorting area and proceed to the waiting area to wait for the next sorting period.

Before formulating the dynamic path planning problem in the AGV sorting system, we establish the following assumptions:

  1. 1.

    The AGV sorting system possesses an ample number of AGVs, ensuring that at least one AGV is waiting outside each loading port.

  2. 2.

    The moving speeds of all AGVs are the same.

  3. 3.

    A new task randomly appears at each entrance, and the task generation times follow a Poisson distribution.

  4. 4.

    AGVs are restricted to movement along the horizontal x-axis or the vertical y-axis.

Problem formulation

The AGV sorting area is modeled as an undirected graph \(\mathcal {G}=(\mathcal {V},\mathcal {E})\), where \(\mathcal {V}=\{v_1,v_2,\ldots ,v_{N^{\mathcal {V}}}\}\) represents the point set in the sorting area (i.e., the locations of the QR codes), and \(\mathcal {E}=\{(v_p,v_q )|v_p,v_q\in \mathcal {V}\}\) denotes the set of edges connecting these points. Here, \(N^{\mathcal {V}} = |\mathcal {V}|\) is the total number of points in the sorting area. \(\mathcal {G}\) includes only the AGV sorting area, as path planning is processed in this area. Consequently, the AGV sorting area is the path planning area. In addition, \(\mathcal {V}^I=\{v^I_1, v^I_2,\ldots , v^I_{N^I}\}\) and \(\mathcal {V}^O=\{v^O_1, v^O_2,\ldots , v^O_{N^O}\}\) denote the entrance set and exit set, respectively, where \(N^I=|\mathcal {V}^I|\), \(N^O=|\mathcal {V}^O|\) and \(\mathcal {V}^I, \mathcal {V}^O\subset \mathcal {V}\). Let \(\mathcal {V}^B=\{v^B_1, v^B_2,\ldots , v^B_{N^B}\}\) denote the set of barrier points where AGVs cannot arrive, where \(N^B=|\mathcal {V}^B|\).

Let \(\mathcal {J}=\{J_1, J_2,\ldots , J_{N^{\mathcal {J}}}\}\) represent the total task set received from the server, where \(N^{\mathcal {J}}=|\mathcal {J}|\). Each task, denoted by a 4-tuple \(J_i=(T_i^A,v_i^S,v_i^G,v_i^E )\), is generated by the server, where \(T_i^A\) is the task generation time, \(v_i^S\) is an entrance selected as the start point, \(v_i^G\) is the chosen goal point adjacent to a sorting pane, and \(v_i^E\) is an exit selected as the endpoint.

Let \(\mathcal {S} = (\mathcal {L},\mathcal {T}^S )\) represent a possible solution, where \(\mathcal {L} = \{\mathcal {L}_1,\mathcal {L}_2,\ldots ,\mathcal {L}_{N^{\mathcal {J}}}\}\) is the path set for all tasks and \(\mathcal {T} = \{T_1^S,T_2^S,\ldots ,T_{N^{\mathcal {J}}}^S\}\) is the set of start times for each path. And each path \(\mathcal {L}_i=\{v_1^i,v_2^i,\ldots ,v_{N_i^\mathcal {L}}^i \}\), where \(N_i^\mathcal {L}=|\mathcal {L}_i|\), includes a series of adjacent points. In other words, the points included in \(\mathcal {L}_i\) must meet the subsequent conditions:

$$\begin{aligned} (v_k^i,v_{k+1}^i )\in \mathcal {E}, \forall i\le N^{\mathcal {J}}, k \le N_i^\mathcal {L}. \end{aligned}$$
(1)

In addition, the start and end points of the paths are entrances and exits, and the paths are required to traverse through the designated goal points:

$$\begin{aligned} v_1^i= & {} v_i^S, v_{N_i^\mathcal {L}}^i = v_i^E, \forall i\le N^{\mathcal {J}},\end{aligned}$$
(2a)
$$\begin{aligned} v_k^i= & {} v_i^G, \exists k \le N_i^\mathcal {L}. \end{aligned}$$
(2b)

The start time of a path must exceed the task generation time, as the task cannot be executed prior to its creation.

$$\begin{aligned} T_i^S > T_i^A, \forall i \le N^{\mathcal {J}}. \end{aligned}$$
(3)

Let t represent the system time, that is, the time elapsed since the initiation of the system. Let \(\Delta T\) represent the time required for the AGV to traverse one unit of distance.

Two types of conflicts can result in path failure: point conflicts and edge conflicts. A point conflict arises when two AGVs move to the same point simultaneously.

$$\begin{aligned} v_{\lceil \frac{t-T_i^S}{\Delta T} +1 \rceil }^i=v_{\lceil \frac{t-T_i^S}{\Delta T} +1 \rceil }^j, \exists t > 0. \end{aligned}$$
(4)

Let \(C_{i,j}^V\) indicate whether there is a point conflict between the paths of the i-th and j-th tasks.

$$\begin{aligned} C_{i,j}^V = \left\{ \begin{array}{lcl} 1, &{} \quad &{} \text {if }(4), \\ 0, &{} \quad &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(5)

Similarly, an edge conflict arises when two AGVs move from two adjacent points to each other simultaneously.

$$\begin{aligned} v_{\lceil \frac{t-T_i^S}{\Delta T} +1\rceil }^i= & {} v_{\lceil \frac{t-T_i^S}{\Delta T} +2\rceil }^j \ \bigwedge \ v_{\lceil \frac{t-T_i^S}{\Delta T} +2\rceil }^i \nonumber \\ {}= & {} v_{\lceil \frac{t-T_i^S}{\Delta T} +1\rceil }^j, \exists t>0. \end{aligned}$$
(6)

Let \(C_{i,j}^E\) indicate whether there is an edge conflict between the paths of the i-th and j-th tasks.

$$\begin{aligned} C_{i,j}^E = \left\{ \begin{array}{lcl} 1, &{} \quad &{} \text {if }(6), \\ 0, &{} \quad &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(7)

Lastly, let \(C_{i,j}\) denote whether there is a conflict between the paths of the i-th and j-th tasks.

$$\begin{aligned} C_{i,j}=C_{i,j}^V \parallel C_{i,j}^E. \end{aligned}$$
(8)

There should be no conflicts between any feasible paths.

$$\begin{aligned} \sum _{i=1}^{N^{\mathcal {J}}} \sum _{j=1}^{N^{\mathcal {J}}} C_{i,j} = 0,\forall i,j \le N^{\mathcal {J}}. \end{aligned}$$
(9)

\(T_{mak}\) symbolizes the makespan, defined as the duration of time from the initiation of the system to the completion of the final task. Given a constant number of tasks, \(T_{mak}\) is the reciprocal of the average task completion rate, which signifies the number of tasks accomplished per unit of time. A shorter makespan implies quicker task completion, that is, a higher system efficiency.

$$\begin{aligned} T_{mak}=\max _{i \le N^{\mathcal {J}}} \big (T_i^S+(N_i^\mathcal {L}-1)*\Delta T\big ). \end{aligned}$$
(10)

The dynamic path planning problem in the AGV sorting system For a specified total task set \(\mathcal {J}\) within the topology graph \(\mathcal {G}\), the AGV sorting system needs to plan a feasible solution \(\mathcal {S}\), enabling the system to accomplish all tasks without conflicts while concurrently minimizing the makespan.

$$\begin{aligned}&\min \quad T_{mak}, \end{aligned}$$
(11a)
$$\begin{aligned}&s.t. \quad (1),(2), (3),(9). \end{aligned}$$
(11b)

OTRDP method

In this section, we first introduce the overall framework of the OTRDP method. Next, we describe the two SVM-based fitting models and three subalgorithms of the framework in detail.

Framework of the OTRDP method

Fig. 4
figure 4

Dynamic path planning model of the OTRDP method

Fig. 5
figure 5

Framework of the OTRDP method

The core idea of the OTRDP method is to achieve a better match between the computing time of the algorithm with the actual operation time of the physical system, thereby ensuring the sorting system operates at an efficient state. This is primarily accomplished through two strategies: regulating the number of AGVs within the sorting area and the degree of path optimization.

Initially, we set the replanning period to regulate the quantity of AGVs. Specifically, we conduct a mathematical analysis of the task completion rate and select the replanning period that maximizes the task completion rate. Through this approach, the server selects suitable replanning periods corresponding to different task generation rates.

Secondly, we control the length of the conflict-free path to enhance the optimality of the planned paths. For a group of tasks, superior path optimality results in a shorter overall path for a task, enabling the AGV to complete its task as soon as possible and thereby improving the task completion rate. Consequently, we aim to optimize the planned path without augmenting the waiting time. Let \(T^C_i\) and \(T^M_i\) symbolize the solving stage and the moving stage of the i-th replanning period, respectively. We set the length of the conflict-free path (which needs to be planned in \(T^C_i\)) based on the replanning period (corresponding to \(T^M_i\)). Specifically, we increase the length of the conflict-free path as much as possible while ensuring that \(T^C_i \le T^M_{i-1}\), as shown in Fig. 4. Ideally, \(T^C_i = T^M_{i-1}\).

Typically, the path optimality and computation time can only be ascertained after path planning. However, these two parameters are needed for the above two measures before planning the path. Hence, we derive them by parameter fitting.

Additionally, the temporary target selection algorithm converts the conflict-free path length, denoted as l, into viable temporary targets for various fundamental path planning algorithms (one-shot methods). This constraint emerges as certain static path planning methods cannot directly utilize conflict-free path length for partial path planning.

The OTRDP method is shown in Fig. 5. For a given task set \(\mathcal {J}\), the corresponding replanning period T is initially determined based on its task generation rate, employing the task completion rate analysis-based replanning period setting algorithm. Subsequently, the period task set \(\mathcal {J}_i\) is updated in line with the paths planned during the previous replanning period. We then establish the conflict-free path length l for the current replanning period utilizing the fitting model-based conflict-free path length setting algorithm, and convert l into a temporary task set \(\mathcal {J}_i^\prime \) for various fundamental algorithms via the temporary target selection algorithm. The one-shot path planning process is applied to \(\mathcal {J}_i^\prime \). Ultimately, the completion of the task set is evaluated. If uncompleted, the period task set is updated once again (\(\mathcal {J}_{i+1}\)); if completed, we derive the solution set \(\mathcal {S}\). The OTRDP method operates as a dynamic path planning framework, within which a variety of one-shot path planning methods and parameter estimation methods can be implemented to plan paths that fulfill varying performance requirements.

SVM-based fitting models

Our method hinges on the premise that the computing time and path optimality of a one-shot path planning method are strongly correlated with the map size (i.e., the row number \(\alpha \) and column number \(\beta \) of the sorting area), the number of AGVs N (i.e., the task number) in the sorting area, and the maximal distance D of the tasks in the task group. This correlation is attributable to the following factors:

  • A larger map means more alternative paths (i.e., an expanded solution space), necessitating more time to locate a solution for a singular path. However, a larger map leads to a reduced density of AGVs, thereby decreasing conflicts. Consequently, the path optimality improves.

  • An increased number of AGVs contributes to heightened conflicts, complicating the search for a set of conflict-free paths. As a result, the computing time increases, and the path optimality deteriorates.

  • The maximal Manhattan distance represents the extent of the search necessary for problem-solving. A greater maximal Manhattan distance not only leads to an increase in the computing time for a single path but also incites more conflicts. In addition, longer one-shot paths enhance path optimality for the windowed strategy.

However, the influences of these factors on the computing time and path optimality are not completely independent. Hence, the precise relationship remains indeterminate, and it poses a challenge to express this relationship with a specific mathematical function.

In this research, we select the SVM method to fit the relationship between the actual computing time \(T^C\) and the above four parameters. The SVM approach is particularly apt due to its proficiency in handling non-linear relationships and its effectiveness in scenarios characterized by small sample sizes. This methodological choice facilitates the swift derivation of new fitting models for AGV sorting systems through a minimal number of experimental trials.

SVM-based computing time fitting model

Let \(f_T\) represent the relationship between the computing time and the above four parameters. We then obtain the computing time fitting model \(f_T\):

$$\begin{aligned} T^P = f_T(\alpha ,\beta ,N,D). \end{aligned}$$
(12)

Here, \(T^P\) signifies the estimated value of \(T^C\). The fitting model \(f_T\) is shown in “SVM-based fitting models” Section.

SVM-based path optimality fitting model

We denote the path optimality H as the ratio of the different path lengths when considering all AGVs in contrast to disregarding other AGVs [3]. Furthermore, the shortest path that does not consider other AGVs essentially corresponds to the Manhattan distance between two points on the grid topology map. Consequently, we articulate the path optimality as the ratio of the planned path length to the total Manhattan distance:

$$\begin{aligned} H= \frac{\sum _{i=1}^{N^{\mathcal {J}}} N_i^\mathcal {L}}{\sum _{i=1}^{N^{\mathcal {J}}} (d_m(v_{i}^S,v_{i}^G) + d_m(v_{i}^G,v_{i}^E))}. \end{aligned}$$
(13)

Here, the larger H is, the longer the path and the worse the path optimality.

Let \(f_H\) represent the relationship between the path optimality and the above four parameters. By employing a fitting process analogous to that used for the computing time fitting model, we derive \(f_H\), as detailed in “SVM-based fitting models” Section.

$$\begin{aligned} H^P = f_H(\alpha ,\beta ,N,D). \end{aligned}$$
(14)

It should be noted that the Manhattan distance is a lower bound of the path lengths obtained from all path planning methods and \(H^P\) should be greater than or equal to 1 (13). Consequently, we select the larger of \(f_H(\alpha ,\beta ,N,D)\) and 1 as the result of the fit.

$$\begin{aligned} H^P = \max (f_H(\alpha ,\beta ,N,D),1). \end{aligned}$$
(15)

Replanning period setting based on task completion rate analysis

Initially, we set the replanning period T to make the entry rate of the AGVs more rational. Within a certain AGV density range, an increase in concurrent tasks elevates the average task completion rate; however, it also augments congestion within the AGV sorting system. Hence, the purpose of a rational entry rate is to control the number of AGVs in the sorting area, thereby maintaining a high task completion rate without inducing congestion. This methodology achieves a balance between the amount of required calculation and the available computing time in terms of the number of AGVs.

In the dynamic path planning problem, AGVs bearing new tasks enter the system, while empty AGVs depart. Consequently, the count of AGVs within the AGV sorting area exhibits dynamic fluctuations. However, the task completion rate remains relatively constant for a given set of task generation rate \(R^I\) and replanning period T. We denote the actual duration of each replanning period as \(T_{real}\). The AGV must wait when the computing time \(T^C\) exceeds the preset replanning period T, which will prolong the replanning period, so

$$\begin{aligned} T_{real} = \max (T,T^C). \end{aligned}$$
(16)

Specifically, a task emerges within the system on average every \(1/R^I\) seconds, with an equivalent probability of occurrence at each entrance. The number of entry points, denoted as \(N^I\), is finite and each can accommodate only one AGV per replanning period. Hence, not all tasks can enter the path planning area (the sorting area) in the next replanning period when the task generation rate \(R^I\) is excessively rapid and the number of tasks generated within \(T_{real}\) exceeds \(N^I\). Redundant tasks will be queued at each entrance, awaiting subsequent scheduling. Let \(\eta \) represent the actual number of AGVs entering the path planning area during each replanning period. Then, we have:

$$\begin{aligned} \eta = \min \big (R^I * T_{real}, N^I\big ). \end{aligned}$$
(17)

The reason for setting \(\eta = R^I * T_{real}\) is that a dynamic balance exists between \(\eta \) and \(R^I\) when \(R^I * T_{real} \ge N^I\). If \(\eta < R^I * T_{real}\) during a certain replanning period, some newly generated tasks will appear at the same entrance. These tasks accumulate outside the entrance, which will lead to an increase in the number of AGVs entering during the next replanning period. In contrast, \(\eta > R^I * T_{real}\) means that tasks have amassed at some of the entry points. The entry of additional AGVs into the path planning area assists in reducing this accumulated backlog. In general, with the increase in the task generation rate, \(\eta \) increases linearly but stabilizes around \(R^I*T_{real}\). When \(R^I * T_{real} \ge N^I\) is reached, \(\eta \) equals \(N^I\) and remains constant thereafter.

Let \(d_{mean}\) denote the average task distance, defined as the average Manhattan distance for tasks within a map of a given size.

$$\begin{aligned} d_{mean} = \lim _{N^{\mathcal {J}} \rightarrow \infty }\frac{\sum _{i=1}^{N^{\mathcal {J}}} (d_m(v_{i}^S,v_{i}^G) + d_m(v_{i}^G,v_{i}^E))}{N^{\mathcal {J}}}.\nonumber \\ \end{aligned}$$
(18)

In practice, the path length is affected by various factors. The degree of influence of these factors is represented by the optimality value H. We replace the actual optimality value H with the estimated value \(H^P\) to obtain the average path length \(d_{mean} * H^P\) that is closer to the actual path length.

Furthermore, with the introduction of new AGVs in each replanning period, the paths must be replanned every \(T/{\Delta T}\) steps for AGVs. Based on the above average path length, we can determine the number of calculations as:

$$\begin{aligned} N^C = \frac{d_{mean} * H^P}{T/{\Delta T}}. \end{aligned}$$
(19)

Let \(T_{test}\) denote the duration of the entire experimental process, which encompasses a certain number of periods as follows:

$$\begin{aligned} N^P = \frac{T_{test}}{T_{real}}. \end{aligned}$$
(20)

A total of \(N^P * \eta \) tasks are executed during the entire experiment. However, \(N^C * \eta \) tasks, generated during the last \(N^C\) replanning periods, remain incomplete at the conclusion of the experiment. Hence, a total of \((N^P - N^C) * \eta \) tasks are successfully completed. Let \(R^O\) denote the task completion rate, equivalent to the rate at which AGVs depart from the sorting area. According to the above analysis, the following equation holds true:

$$\begin{aligned} R^O = \lim _{T_{test} \rightarrow \infty } \frac{(N^P - N^C) * \eta }{T_{test}}. \end{aligned}$$
(21)

We substitute \(N^P\) and \(N^C\) and obtain:

$$\begin{aligned} \begin{array}{ll} R^O &{}= \displaystyle \frac{\eta }{T_{real}} \\ &{}= \displaystyle \frac{\min (R^I * \max (T,T^C), N^I)}{\max (T,T^C)}. \end{array} \end{aligned}$$
(22)

This equation suggests that \(R^O\) mirrors the rate at which AGVs enter the sorting area, thereby indicating a state of dynamic equilibrium. In (22), both \(R^I\) and \(N^I\) are known quantities. Therefore, our task is to discern the relationship between \(T^C\) and T.

In this context, \(T^C\) is the actual computing time, which can only be obtained after the execution of the algorithm. However, it is necessary to determine the most suitable T before planning the path. Hence, we substitute \(T^C\) with \(T^P\):

$$\begin{aligned} R^O = \frac{\min (R^I * \max (T,T^P), N^I)}{\max (T,T^P)}. \end{aligned}$$
(23)

Here,

$$\begin{aligned} T^P = f_T(\alpha ,\beta ,N^A,T/\Delta T). \end{aligned}$$
(24)

\(N^A\) is the number of AGVs in the sorting area, and it is equal to the product of the number of AGVs entering during each replanning period and the number of planning periods for a task:

$$\begin{aligned} \begin{array}{ll} N^A &{}= N^C * \eta \\ &{}= \displaystyle \frac{d_{mean}*H^P}{T/\Delta T} * \eta . \end{array} \end{aligned}$$
(25)

Hence, we need to ascertain the relationship between \(N^A\) and T. However, there are three variables in (25): \(N^A\), T, and \(H^P\). Therefore, the optimality model can act as a constraint:

$$\begin{aligned} H^P = \max (f_H(\alpha ,\beta ,N^A,T/\Delta T),1). \end{aligned}$$
(26)

Let \(f_{H-N}\) denote the relationship among \(N^A\), \(H^P\) and T. Next, combining (25) and (26), we can derive \(f_{H-N}\), as shown in “SVM-based fitting models” Section:

$$\begin{aligned}{}[H^P,N^A] = f_{H-N}(\alpha ,\beta ,T/\Delta T). \end{aligned}$$
(27)

With (27), we determine the relationship between \(N^A\) and T. Moreover, we compute \(T^P\) and \(R^O\) according to (23). We then select the smallest T value that maximizes \(R^O\) (the purpose) as the duration of the replanning period based on (23). The minimum value is chosen to minimize the delay of task completion, while maximizing the task completion rate. In addition, it should be highlighted that T must be an integer multiple of \(\Delta T\) since the path length of each replanning period must be an integer.

As shown in Algorithm 1, we initially obtain the corresponding optimality values \(H^P\) and the number \(N^A\) of AGVs in the sorting area for all T values according to \(f_{H-N}\) (lines 4). We set T to range from 1 to \(\lceil d_{mean} \rceil \) as \(\lceil d_{mean} \rceil \) is a relatively long distance. When this distance is employed as the path length, the path planning algorithm is practically compelled to plan a full path rather than a fragment. Subsequently, we obtain the estimated computing time \(T^P\) with the SVM-based computing time fitting model and further derive the estimated value of \(T_{real}\), denoted as \(T_{real}^P\) (lines 5–6). The predicted value \(\eta ^P\) of AGVs entering the path planning area in each replanning period is obtained (line 7). At the end of the "for" loop, we compute \(R^O\) by combining \(T_{real}^P\) and \(\eta \) (line 8). Finally, we traverse the values to find the smallest value of T that maximizes \(R^O\) (lines 10–11).

Algorithm 1
figure a

Task Completion Rate Analysis-Based Replanning Period Setting Algorithm

Conflict-free path length setting based on the fitting model

The setting of the replanning period harmonizes the necessary computation load with the available computing time, approached from a comprehensive standpoint. From the perspective of a specific path, we set the conflict-free path length of each planning period to control the path optimality. In the dynamic path planning problem, new tasks from \(\mathcal {J}\) enter the sorting system, and existing tasks are completed during each replanning period. Therefore, the tasks executed during each replanning period are distinct, but all these tasks originate from the total task set \(\mathcal {J}\). We define the collection of tasks in each replanning period as the period task set, which is the task set in the sorting area when \(T_i\) starts. There are two differences between the total task set and the period task set. Firstly, the tasks in the period task set have no task generation time. Secondly, the start points of the tasks in the period task set are the locations where the corresponding AGVs are located at the beginning of the replanning period \(T_i\), as opposed to the start points of the corresponding tasks in \(\mathcal {J}\). Let \(\mathcal {J}_i= \{J_{i,1},J_{i,2},\ldots ,J_{i,{N_i^{\mathcal {J}}}}\}\) denote the period task set \(\mathcal {J}_i\) of the replanning period \(T_i\).

$$\begin{aligned} J_{i,k} = \left\{ \begin{array}{lcl} (v_{i,k}^S,v_{i,k}^G,v_{i,k}^E), &{} \quad &{} \gamma = 0, \\ (v_{i,k}^S,v_{i,k}^E), &{} \quad &{} \gamma = 1. \end{array} \right. \end{aligned}$$
(28)

Here, \(\gamma = 1\) denotes that the express package has been delivered to the sorting pane and the AGV is in the process of exiting the sorting area. Conversely, \(\gamma = 0\) implies the express package is currently being delivered to the sorting pane.

Similarly, we can obtain the distance between the start point and the endpoint:

$$\begin{aligned} d_{i,k} = \left\{ \begin{array}{lcl} d_m(v_{i,k}^S,v_{i,k}^G)+d_m(v_{i,k}^G,v_{i,k}^E), &{} \quad &{} \gamma = 0, \\ d_m(v_{i,k}^S,v_{i,k}^E), &{} \quad &{} \gamma = 1. \end{array} \right. \end{aligned}$$
(29)

Here, \(d_m\) denotes the Manhattan distance between two points.

It is imperative to acknowledge that a longer conflict-free path length results in better path optimality, but it also leads to increased computing time. In our dynamic model, the solving stage of the current period \(T_i^C\) overlaps with the moving stage of the previous period \(T_{i-1}^M\). To ensure the continuous operation of the entire system, the length of the moving stage should be less than the length of the replanning period (\(T_{i-1}^M < T\)). Therefore, to select the most suitable conflict-free path length, we need to spend as much time as possible optimizing the path within the time limit imposed by the replanning period (\(T_i^C < T\)). But it is impossible to obtain the length of \(T_i^C\) at the onset of \(T_{i-1}\), as shown in Fig. 6, thus we resort to using an estimated value \(T^P_i\) from the SVM-based fitting model in its place. The ideal conflict-free path length should satisfy the following criteria:

$$\begin{aligned} l= & {} \max \{ l' \mid f_T(\alpha , \beta , N_i^{\mathcal {J}}, l') < T \},\nonumber \\ l'\in & {} \{1,2,...,\lceil d_{mean} \rceil \} \end{aligned}$$
(30)

From (30), it is evident that the number of AGV (or task number) in the sorting area is required to determine the estimated value \(T^P_i\) of \(T^C_i\). Indeed, this fitting process needs to be performed at the beginning of \(T_{i-1}\), given that \(T^P_i\) should be determined before the solving stage \(T^C_i\) as shown in Fig. 6. However, the task number \(N_i^{\mathcal {J}}\) of set \(\mathcal {J}_i\) can only be obtained at the beginning of \(T_i\). This is due to the fact that it is impossible to determine how many tasks are completed in the current replanning period and how many new tasks will start to be executed in the next replanning period.

Fig. 6
figure 6

The conflict in computing time fitting process

To solve this conflict, we estimate the number of tasks in \(\mathcal {J}_{i}\) based on \(\mathcal {J}_{i-1}\) and use this estimated number to fit \(T_i^P\).

Firstly, we identify the tasks that cannot be completed during period \(T_{i-1}\) and retain them to set \(\mathcal {J}^R_{i}\). The AGVs performing tasks from \(\mathcal {J}_i\) need to move during \(T_i\), and as a result, their paths need to be planned during \(T^C_i\) (i.e., \(T_{i-1}\)). At the beginning of \(T_{i-1}\), we obtain \(\mathcal {J}_{i-1}\). For any given task \(J_{i-1,k}\) in this set, we may encounter the following scenarios:

  • If the distance \(d_{i-1,k}\) is smaller than \(T/\Delta T\) (i.e., the maximum distance an AGV can travel within a replanning period), the task will be completed during \(T_{i-1}\). Hence, it will not be present in \(\mathcal {J}_i\).

  • In contrast, if the distance \(d_{i-1,k}\) is not smaller than \(T/\Delta T\), the task will be part of \(\mathcal {J}_i\).

Secondly, we assume that new AGVs enter from all entrances during \(T_i\). While this may lead to an overestimation of \(\mathcal {J}^R_{i}\) in comparison to \(\mathcal {J}_i\), it can help to a certain extent in reducing the extra waiting time induced by prolonged calculation durations. This is because \(T^C_i\) and \(T^P_i\) do not completely coincide. Despite our efforts to keep \(T^P_i\) lower than T, \(T^C_i\) might potentially surpass T, resulting in AGV halting and waiting. Moreover, an overestimated task count leads to a longer fitted computing time, but it also results in a shorter selected conflict-free path length l. As a consequence, the actual computing time \(T^C_i\) decreases as well, making it more likely to be less than T.

The conflict-free path length setting algorithm is detailed as Algorithm 2. Initially, tasks that will continue execution in the period \(T_i\) are determined based on their respective distances (lines 3–5). We add the maximum number of tasks that can be admitted each time (i.e., \(N^O\)) to the number of tasks retained up to period \(T_i\) (i.e., \({J}^R_{i}\)) to estimate the total number of tasks N (line 6). Finally, we replace \(N_i^{\mathcal {J}}\) with N and select the most suitable conflict-free path length based on (30) (line 7).

Algorithm 2
figure b

Fitting Model-Based Conflict-Free Path Length Setting Algorithm

Selecting temporary targets

The conflict-free path length setting algorithm translates the replanning period into a conflict-free path length l to control the path optimality. However, the application of conflict-free path lengths within the fundamental path planning algorithm also poses a significant challenge that must be addressed: converting conflict-free path lengths into realistic target points. To this end, we set temporary target points after obtaining l, as shown in Fig. 5. There are three strategies according to different characteristics of the fundamental path planning algorithm. The temporary target selection algorithm is presented as Algorithm 3. In this context, \(\mathcal {J}_{i}^\prime \) denotes the temporary target set of \(\mathcal {J}_{i}\).

Algorithm 3
figure c

Temporary Target Selection Algorithm

The first type of fundamental path planning algorithm is suitable for the windowed strategy in RHCR [4], for example, the CBS-based method. For such methods, the conflict-free path length is assigned a value of l (lines 1–2), and the multi-label A* method [4] is employed. The former measure only records conflicts within l steps, followed by their resolution. Regarding the path segment that extends beyond l steps from the start point, path planning is solely executed for a single AGV, with conflict resolution being excluded. The latter measure can obtain the complete path for a task \(J_{i,k}=(v_{i,k}^S,v_{i,k}^G,v_{i,k}^E)\) in the period task set, which has two consecutive targets, by planning once instead of twice. Additionally, if the multi-label A* method cannot be utilized, we can divide task \(J_{i,k}\) into two independent tasks at \(v_{i,k}^G\). Hence, it is not necessary to set a temporary target point for this kind of fundamental path planning algorithm, and only the conflict-free path length needs to be set.

Fig. 7
figure 7

Setting the temporary target

The second type of fundamental path planning algorithm cannot plan conflict-free paths within the length l by merely setting the conflict-free path length parameter, though it can plan paths for a series of targets. In this scenario, we set temporary targets to control the length of conflict-free paths. As shown in Fig. 7, the start point \(v^S_{i,k}\) is located at the lower left corner, and the goal point \(v^G_{i,k}\) is located at the upper right corner. Initially, the AGV is required to move following the connecting line between the QR codes in the grid topology map, thus allowing it to move only horizontally or vertically. Consequently, in the majority of cases, the shortest path between these two points is likely to be within the rectangle formed by the diagonal line connecting the points. We denote the set of points within the rectangle as \(\mathcal {V}^R\), which includes the green points in Fig. 7. Subsequently, the distance between the appropriate temporary target points and the start point ought to satisfy:

$$\begin{aligned} d_m \big (v_{i,k}^S,v\big )=l. \end{aligned}$$
(31)

We designate the set of points as \(\mathcal {V}^D\), which includes the points on the line in Fig. 7. Next, a temporary target must not be situated at the barrier point set \(\mathcal {V}^B\), which includes the black points in Fig. 7. Lastly, points that are already serving as temporary targets cannot function as temporary targets for other tasks, because some fundamental path planning algorithms cannot plan paths for two tasks with the same targets. We denote the set of selected points as \(\mathcal {V}^T\). A viable candidate for the temporary point satisfies the above four conditions, and we denote the set of points as \(\mathcal {V}^C\):

$$\begin{aligned} \mathcal {V}^C=\mathcal {V}^R \cap \mathcal {V}^D-\mathcal {V}^B-\mathcal {V}^T. \end{aligned}$$
(32)

The candidates for temporary points within \(\mathcal {V}^C\) are represented by the green points in Fig. 7. We randomly select one of these points as the temporary target point. In this scenario, each task has a goal point and an endpoint, which function as two consecutive targets. Hence, according to the next target and the relationship between \(d_{i,k}\) and l, the following potential situations arise:

  • Case 1: The distance to the endpoint is less than or equal to l: \( d_{i,k} \le l\). In this case, we maintain the original target points (lines 7–10 in Algorithm 3).

  • Case 2: The distance to the next target surpasses the conflict-free path length l. Here, we consider two scenarios, hinging on whether the express package has arrived at its assigned sorting pane: \(\gamma \!=\!1\ \bigwedge \ d_m(v_{i,k}^S,v_{i,k}^E)\!>\!l\) and \(\gamma \!=\!0 \ \bigwedge \ d_m (v_{i,k}^S,v_{i,k}^G)\!>\!l\). Correspondingly, we employ the stated strategy between \(v_{i,k}^S\) and \(v_{i,k}^E\) or \(v_{i,k}^S\) and \(v_{i,k}^G\), respectively (lines 11–15 in Algorithm 3).

  • Case 3: The distance to the endpoint exceeds l and the express package has not yet reached its designated sorting pane: \(\gamma \!=\!0\ \bigwedge \ d_m(v_{i,k}^S,v_{i,k}^G)\!<\!l\!<\!d_{i,k}\). In this case, the temporary target must be situated between \(v_{i,k}^G\) and \(v_{i,k}^E\). Hence, we select \(v_{i,k}^G\) as the first temporary target, and \(\mathcal {V}^D\) needs to satisfy \(d_m (v_{i,k}^G,v) = l - d_m (v_{i,k}^S,v_{i,k}^G)\) (lines 16–19 in Algorithm 3).

The third type of fundamental path planning algorithm neither controls the scope of conflict resolution by merely setting the conflict-free path length parameter nor plans paths for a series of targets. For such a case, we employ a strategy similar to the second fundamental algorithm, but only one temporary target can be selected (lines 22–23 in Algorithm 3) which is different from the second type of fundamental path planning algorithm. Therefore, the temporary target is \(v_{i,k}^G\) when \(d_m (v_{i,k}^S,v_{i,k}^G) < l\).

Performance evaluation

In this section, we first present the fitting models used in our methods, delve into their characteristic analysis, and subsequently evaluate the performance of the OTRDP method through extensive simulation. We select the Priority-Based Search (PBS) [21] as the fundamental path planning algorithm for the OTRDP method. PBS is a method that combines the CBS method with priority strategies. This method has the advantage of extremely fast solving speed. Hence, more time can be spent on optimizing paths under our OTRDP method. All our tests are performed using MATLAB on a desktop computer (Intel Core i7-6700, 32 GB memory).

SVM-based fitting models

In “System model and problem formulation” Section of the paper, we introduced SVM-based fitting models. However, within this method, we only discussed its principles and procedures. To analyze more vividly how the fitting model adapts during cyclic variations, thus making the principles of the algorithms more comprehensible, we present the model fitting process through simulation data graphs. In this section, we exclusively present the models specifically for the \(22\times 22\) map, while acknowledging that the models for other maps demonstrate similar characteristics and behaviors.

Fig. 8
figure 8

Computing time fitting model

The SVM-based computing time fitting model is shown in Fig. 8, and the following information can be inferred:

  • The computing time exhibits an exponential relationship with the number of steps \(T/\Delta T\) and the the total number of AGVs, denoted as \(N^A\): it remains within an acceptable range for smaller scales; however, as \(T/\Delta T\) or \(N^A\) increases, the computing time rises exponentially. This is primarily due to the fact that increasing the planning steps or the number of AGVs leads to more conflicts, and resolving these conflicts demands additional computation.

In our method, we typically select a case with a small estimated computing time \(T^P\), as it is better suited for real-time requirements in practical scenarios. While longer computing times tend to yield improved path optimality, the cost associated with enhancing optimality becomes excessive.

Fig. 9
figure 9

Path optimality fitting model

The SVM-based path optimality fitting model for the same map is shown in Fig. 9, and the following information can be inferred:

  • It is evident that the optimality of the path deteriorates (i.e., the value of \(H^P\) increases) as \(N^A\) increases, as shown in Fig. 9a. This deterioration is primarily attributed to the conflicts arising from an increased number of AGVs.

  • The optimality values exhibit inconsistent variations under different \(N^A\) conditions as \(T/\Delta T\) increases, as shown in Fig. 9b. Specifically, an increase in \(T/\Delta T\) results in a decrease in \(H^P\) (i.e., improved path optimality) when \(N^A\) is small (\(N^A=50\) and \(N^A=100\)), aligning with the notion that longer conflict-free paths contribute to enhanced path optimality. Conversely, the optimality value \(H^P\) initially decreases and subsequently increases when the total number \(N^A\) of AGVs is large (\(N^A=150\) and \(N^A=200\)). This trend can be attributed to the waiting time induced by the extensive computing time. A large number of AGVs results in numerous conflicts, extending the time required to find a solution. If this solving time surpasses the predetermined replanning period, it causes the AGVs to stop and wait. Such waiting increases the number of steps for movement, consequently extending the path length.

Fig. 10
figure 10

Relationships in (25) with different \(R^I\)

Fig. 11
figure 11

\(f_{H-N}\) with different \(R^I\)

The relationships in (25) with \(R^I = 1\) and \(R^I = 2\) are shown in Fig. 10a and b, respectively, and the following information can be inferred:

  • When \(T/\Delta T\) is small, the variation in \(H^P\) is solely dependent on \(N^A\) and follows a linear relationship. This is due to the fact that \(H^P = N^A / (d_{mean}*\Delta T*R^I )\) when T is small (\(R^I*T<N^I\)). There are two factors for this phenomenon: First, \(\eta = R^I * T_{real}\) when \(R^I*T<N^I\); Second, both T and \(T_{real}\) are equivalent to \(T_{real}^P\) for prediction.

  • When \(T/\Delta T\) is large, the variation in \(H^P\) is dependent on both \(N^A\) and \(T/\Delta T\) (i.e., \(H^P = (N^A * T_{real}^P) / (d_{mean}*\Delta T*N^I )\)), because \(\eta = N^I\). In this scenario, an increase in both \(N^A\) and \(T/\Delta T\) leads to an increase in \(H^P\).

  • As \(R^I\) increases, the region of the linear relationship narrows. The underlying reason is that, with the escalation in the task generation rate, the AGV entry rate swiftly approaches its maximal capacity within the identical replanning period. As a result, at the beginning of each replanning period, an AGV is positioned at every entrance, epitomizing the peak entry rate into the sorting area.

The \(\varvec{f_{H-N}}\) in (27) is shown in Fig. 11. By combining Figs. 9 and 10a, we obtain Fig. 11a, which shows \(f_{H-N}\) with \(R^I = 1\). Through this relationship, we can obtain the corresponding path optimality value \(H^P\) and the total number \(N^A\) of AGVs according to the number of planning steps \(T/\Delta T\). The \(f_{H-N}\) with \(R^I = 2\), which is shown in Fig. 11b, is obtained by the same method. From Fig. 11, we know that

  • With the increase in step count \(T/\Delta T\), the entry rate of AGVs into the sorting area slows down, subsequently reducing the number (\(N^A\)) of AGVs present within the sorting area. This reduction in AGV numbers concurrently results in fewer conflicts and improves the optimality of the paths.

  • The top of the curve does not match the trend of the bottom side. The reason is similar to the reason for the change in Fig. 10.

  • The top curve tends to become shorter with increasing \(R^I\), which can be seen from the comparison of Fig. 11a and b. The reason behind this phenomenon is the narrowing of the linear region, which is similar to Fig. 10.

Simulation experiments on the sorting map

Table 3 Simulation experiment parameters for the sorting map

The experiments are focused on comparing our method with existing algorithms to highlight its advantages in terms of efficiency and stability. This comparison is essential for illustrating contribution of the proposed method to the field, offering empirical evidence to support its theoretical improvements over prior work. Through these comparative experiments, we seek to demonstrate the practical relevance of our approach in solving complex pathfinding challenges, thereby reinforcing its significance for further research and application.

To emulate the real-world task generation process, tasks are generated under the conditions of following a Poisson distribution for time and conforming to a uniform distribution for location. Specifically, on average, \(R^I\) tasks are produced per unit time (i.e., the expectation of the Poisson distribution is \(R^I\)), with the generated tasks being randomly distributed across all entry points. Finally, each task set includes 1000 sequentially generated tasks for conducting dynamic path planning tests.

We select two state-of-the-art dynamic path planning methods (the RHCR [4] and the Diversified-path Database-driven Multi-robot Path Planning (DDM) [37] methods) as the comparison methods. The RHCR method is introduced previously. For a fair comparison, we also choose PBS as the fundamental path planning algorithm of RHCR. The DDM method is an efficient method based on ILP-based local conflict resolution. Neither the RHCR nor the DDM methods possess replanning period setting mechanisms. Hence, we select \(T/\Delta T=5\) and 10 from [4] as their replanning periods. We designate the corresponding number of steps with a suffix, for instance, RHCR-5 and RHCR-10.

Fig. 12
figure 12

Metrics in the \(22\times 22\) map

Fig. 13
figure 13

Metrics in the \(34\times 26\) map

Fig. 14
figure 14

Metrics in the \(70\times 38\) map

To compare the effectiveness of the above three methods under various map sizes, we select three different map sizes, \(22\times 22\), \(34\times 26\), and \(70\times 38\), for the simulation. We set task generation rates ranging from 0.25 to 2.5 for \(22\times 22\) and \(34\times 26\) map sizes. For the larger \(70\times 38\) map, due to the increased number of entrances, we choose task generation rates from 1 to 10. For each combination of task generation rate and map size, we execute 10 task groups and then average the outcomes. In addition, to prevent the method from falling into a local optimum, which leads to an inability to obtain results, we set a limit of 200 s for a single solution. In summary, the simulation parameters for the sorting map is presented in Table 3.

In the simulation results, we primarily compare three metrics:

  • Success Rate is defined as the ratio of successful trials to the total number of trials.

  • Makespan is denoted as \(T_{mak}\).

  • Average Number of Steps is the average count of steps needed for an AGV to complete a task.

The results for \(22\times 22\), \(34\times 26\), and \(70\times 38\) maps are shown in Figs. 1213 and 14, respectively.

Success rate represents the stability of a method. From Figs. 12a, 13a and 14a, we observe that

  • The DDM method only achieves a \(100\%\) success rate in a fraction of the experiments, occasionally descending into a deadlock due to congestion. This situation arises because the DDM method employs a local conflict resolution strategy rather than a global one. The local conflict resolution method only resolves conflicts in a \(3\times 2\) or \(3\times 3\) sub-graph, which can greatly improve the solution speed. However, different sub-graphs are clustered together to block the path of the AGVs when multiple conflicts occur together. This leads to new conflicts, necessitating the construction of new sub-graphs to resolve these conflicts. This process can ultimately result in a deadlock, preventing the achievement of a feasible solution within a prescribed timeframe.

  • The success rates of OTRDP, RHCR-5, and RHCR-10 consistently reach \(100\%\) , irrespective of the map size. This result stems from the adoption of the global planning method, PBS, as their fundamental path planning algorithm, which largely avoids the deadlocks caused by local optimization. Moreover, both methods adopt the windowed path planning strategy, which only plans the next segment of the path and avoids solution failure caused by overwhelming computing time. Consequently, these methods maintain high success rates within a restricted timeframe.

Makespan represents the efficiency of the path planning method, and it is the main metric we need to optimize. Observations from Figs. 12b, 13b, and 14b reveal the following.

Impact of task generation rate on makespan:

  • When there are fewer tasks being generated (lower task generation rates \(R^I\)), the differences in makespan between various methods are not very obvious. This is likely because there are fewer Automated Guided Vehicles (AGVs) active in the sorting area under these conditions.

  • As the task generation rate increases, we observe a decrease in makespan across all methods. This happens because a higher \(R^I\) leads to tasks starting earlier, which in turn moderately reduces the makespan.

Impact of map size on makespan:

  • As we increase the size of the map, the advantage of the OTRDP method in reducing makespan becomes less obvious. We calculated the ratio of task completion rates for OTRDP and RHCR-5, as seen in the last points of the curves in the mentioned figures. These ratios, detailed in Table 4, show that the superiority of OTRDP method diminishes with larger map sizes.

  • The reason behind this is the limited computing resources available for path optimization as the map size increases. A larger map not only increases the solution space for path planning but also accommodates more AGVs, which significantly increases the computing time required. The computing system, therefore, has less spare capacity to further optimize paths. In the \(70\times 38\) map experiment, while OTRDP still shows a makespan advantage, its performance closely aligns with that of RHCR-5, confirming our hypothesis regarding the use of computational resources.

Comparison of different methods:

  • When we look at how the efficiency of different methods changes, we see that their performance starts to diverge at different rates but eventually stabilizes. Among these methods, our proposed OTRDP method consistently achieves the lowest makespan after stabilization.

  • The DDM-10 and RHCR-10 methods were the first to show divergence. This is due to the number of tasks entering the AGV sorting area reaching a near-saturation point. Following this, DDM-5 begins to lag behind RHCR-5 and OTRDP. The superior global optimization capabilities of RHCR and OTRDP contribute to further reductions in makespan.

  • A critical factor here is how these methods plan and replan tasks. RHCR and DDM methods use fixed replanning periods, which can be less effective at different task generation rates. In contrast, the OTRDP method dynamically adjusts its replanning period to optimize task completion rates, taking into account the balance between path optimality and computing time under varying scenarios. Additionally, OTRDP adjusts the conflict-free path length based on the number of AGVs, leading to higher efficiency, particularly at high task generation rates.

Table 4 Ratios of task completion rates between OTRDP and RHCR-5

Average number of steps somewhat represents the energy consumed to complete a task, which is shown in Fig. 12c, 13c and 14c. Observations from these figures reveal the following:

  • The OTRDP and RHCR curves are close to the average task distance \(d_{mean}\) (\(22\times 22\): 25.83, \(34\times 26\): 32.93, \(70\times 38\): 53.83) and lower than the DDM method curve. Global planning is the main reason why they have fewer path steps than the DDM method. Particularly, the DDM-5 curve rises significantly above others, indicating higher average number of steps of the DDM. This increase can be explained firstly by the lack of global path optimization and secondly by the manner in which its sub-graphs potentially obstruct the optimal paths of other AGVs for the DDM method. As the task generation rate increases, the amount of waiting time escalates, leading to a rapid increase in the average number of steps. The rate at which the AGVs enter the sorting area for the DDM-10 is lower than that for the DDM-5, resulting in lighter congestion and a lower average number of steps.

  • There is a tradeoff between the makespan and the average number of steps for a dynamic path planning method. Among the RHCR methods, the curve of RHCR-10 is lower than that of RHCR-5 due to its slower AGV entry rate. However, the extended makespan time of RHCR-10 compared to RHCR-5 suggests that achieving shorter paths may result in longer makespan times. A similar phenomenon is found in the DDM method.

  • The OTRDP method outperforms RHCR-5 in both the makespan and the average number of steps, but the advantage decreases as the map size increases. Unlike the compromise between RHCR-5 and RHCR-10, OTRDP consistently achieves a shorter makespan than RHCR-5 across all rates, and surpasses RHCR-5 in terms of the average number of steps at certain rates. This is due to the utilization of redundant computational resources during idle time. In addition, because our OTRDP method mainly optimizes the task completion rate (i.e., the makespan time), it tends to use a shorter replanning period to allow more AGVs to enter the AGV sorting area. However, the path planning in the \(70\times 38\) map requires substantial computations. Consequently, when the focus is on minimizing the makespan, the optimization of the number of steps is constrained. As a result, the number of steps for the OTRDP method increases with the map size and surpasses that of RHCR-5.

Simulation experiments on the public map

Fig. 15
figure 15

Metrics in the random-64-64-10 map

To further validate the effectiveness of our method, we conduct simulation tests on publicly available maps from movingai.com. A randomly selected large map, random-64-64-10 [40], is used for this purpose. Since there are no entrances, exits, and sorting panes in this experiment, we generate tasks randomly. In addition, the task generation rates range from 1 to 10, which is the same as the \(70\times 38\) size of the sorting map. The result is shown in Fig. 15. The simulation parameters for the sorting map is presented in Table 5. Given that some parameters are identical to those used in the sorting simulation experiment, we only display the portions that differ.

There are only curves for the OTRDP and RHCR in Fig. 15, because the DDM can only resolve conflicts in \(2\times 3\) obstacle-free sub-graphs and numerous positions in this map do not allow finding such sub-graphs. Hence, the paths planned by the DDM culminate with congestion in this experiment. As shown in Fig. 15, we observe the following results:

  • Firstly, the success rates of both methods remain consistent at 1 for all \(R^I\), which is the same as the sorting experiments.

  • In addition, the makespan of OTRDP has a certain advantage over the RHCR, which is consistent with the results in sorting maps.

  • Ultimately, the average number of steps for the OTRDP is initially the lowest, but as \(R^I\) increases, the curve of OTRDP rises and eventually resides between the two RHCR curves. However, we observe that the average number of steps is more similar to the results obtained on smaller-sized sorting maps (\(22\times 22\), \(34\times 26\)), where all three methods reported comparable counts. In contrast, the results from larger-sized sorting maps (\(70\times 38\)) show some inconsistency, with the OTRDP method registering a higher average step count than the two RHCR curves. A possible reason is that the random-64-64-10 maps feature fewer obstacles (10%) compared to sorting maps with a higher obstacle rate (approximately 25%). This leads to less computational time needed for conflict resolution in the map. As a result, the OTRDP algorithm can allocate more redundant computing power to construct longer conflict-free paths, thereby resulting in shorter paths.

Overall, the consistency of testing results on public maps with those on sorting maps further substantiates the efficacy of the OTRDP method.

Table 5 Simulation experiment parameters for the random map

Conclusion

This study presents the OTRDP method, a novel approach designed to address the complexities of dynamic path planning within AGV sorting systems. The main contribution of this research lies in the development of a new dynamic path planning framework, predicated on accurate parameter estimation. This framework significantly improves system efficiency by implementing two pivotal strategies: (1) optimizing the replanning period through a thorough analysis of task completion rates, and (2) employing an SVM-based model to accurately determine conflict-free path lengths for each replanning period. These strategies greagly enhance the operational efficiency. Subsequently, the conflict-free path length is converted into temporary executable targets, broadening the applicability of the OTRDP method. Empirical evidence from simulations confirms that the OTRDP method not only ensures optimal system efficiency with prudent use of computational resources but also maintains system stability and fosters energy conservation. The efficacy of OTRDP, particularly in large-scale logistics and manufacturing environments, highlights its potential to significantly boost operational throughput and cost-efficiency, showcasing its adaptability and resilience in dynamic settings.

Several avenues of future work exist for the OTRDP method. These include refining the SVM approach or exploring alternative fitting methods to enhance estimation accuracy. Another promising direction is the development of a model transfer methodology, which would enable the swift application of the OTRDP method in new systems by leveraging existing fitting model data.