Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells

**ang, De; Cheng, Yijun; Li, Qingxian; Wang, Qiong; Liu, Liangjiang

doi:10.1007/s42452-024-05789-2

Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells

Research
Open access
Published: 18 March 2024

Volume 6, article number 140, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Applied Sciences Aims and scope Submit manuscript

Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells

Download PDF

De **ang¹,
Yijun Cheng²,
Qingxian Li¹,
Qiong Wang³ &
…
Liangjiang Liu¹

267 Accesses
Explore all metrics

Abstract

To effectively tackle the intricate and dynamic challenges encountered in proton exchange membrane fuel cells (PEMFCs), this paper introduces a model-free reinforcement learning approach to address its water management issue. Recognizing the limitations of conventional reinforcement learning methods such as Q-learning in handling the continuous actions and nonlinearity inherent in PEMFCs water management, we propose a prioritized deep deterministic policy gradient (DDPG) method. This method, rooted in the Actor-Critic framework, leverages double neural networks and prioritized experience replay to enable adaptive water management and balance. Additionally, we establish a PEMFCs water management platform and implement the prioritized DDPG method using "Tianshou", a modularized Python library for deep reinforcement learning. Through experimentation, the effectiveness of our proposed method is verified. This study contributes to advancing the understanding and management of water dynamics in PEMFCs, offering a promising avenue for enhancing their performance and reliability.

Article Highlights

Enhanced water management: Prioritized DDPG method optimizes water balance in PEMFCs, addressing dynamic conditions.
Actor-Critic framework: Double neural networks and prioritized experience replay ensure adaptive water management.
Practical implementation: Proposed method validated on a PEMFCs platform, confirming its effectiveness in real-world scenarios.

Optimal operation of reverse osmosis desalination process with deep reinforcement learning methods

Article Open access 01 April 2024

Deep Reinforcement Learning Based Smart Water Heater Control for Reducing Electricity Consumption and Carbon Emission

Multi-objective deep reinforcement learning for emergency scheduling in a water distribution network

Article 06 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Hydrogen fuel cells, recognized for their high power density, rapid start-up, environmental friendliness, and exceptional energy conversion efficiency, have attracted substantial global investments in research and industrial development [1]. Their potential extends to enhancing societal well-being and fostering economic stability at a national level.

For proton exchange membrane fuel cells (PEMFCs), effective water management emerges as a critical factor influencing normal operation and overall efficiency. Maintaining an optimal water balance is essential to prevent issues such as overflow and dehydration that can compromise PEMFC performance. Excessive water reaching the flow channels or cathode and anode poses a risk of failure, while insufficient water can hinder proper operation [2]. These challenges underscore the pivotal role of water management in the performance of PEMFCs [3].

The conventional model-based approach to water management primarily concentrates on formulating mathematical models for simulating the internal water migration within fuel cells. In [4], a multiple-input-multiple-output fuzzy controller is developed and demonstrated for water and thermal management of fuel cell systems in real-time. Results show that the proposed fuzzy controller effectively increased the output power of PEM fuel cell. In [5], a 3D multi-phase model based on Eulerian-Eulerian is presented for water management model in the PEMFC system. In [6], an active disturbance rejection control strategy is used to balance the humidity of the PEMFC.

To enhance the dynamic water management method, [7] introduces a physics-based PEMFC model considering the localized impact of humidity on fuel cell stack performance. In [8], a proportional-integral and active disturbance rejection control strategies are devised for air supply and coolant flow in PEMFCs. In [9], a fractional-order PID dynamic control strategy is introduced to balance the membrane humidity of PEMFC.

Constructing physical models for water management in fuel cells poses considerable challenges, prompting the exploration of alternative approaches such as neural network algorithms. In [10], an artificial neural network is leveraged to emulate the power output of PEMFCs. Reference [11] introduces a dynamic neural network-based maximum power point tracking (MPPT) control methodology tailored for fuel cell applications. Furthermore, in [12], convolutional neural networks are employed to quantify liquid water content within PEM fuel cells.

The aforementioned studies primarily focus on the modeling of water management processes, which has posed significant challenges. It is evident that an inaccurate water management strategy can result in suboptimal performance, potentially compromising the overall efficiency and reliability of PEMFCs. Indeed, improper water management can lead to issues such as flooding or drying within PEMFCs.

In order to adapt the dynamic and complex environments inherent in PEMFCs, model-free model is introduced to deal with water management problem. An actor-critic-based model-free reinforcement learning approach is proposed for water management of the PEM fuel cell in [13]. The simulation results showcase the effectiveness of this approach in maintaining water balance within the fuel cell stack, thereby maximizing stack voltage.

The advancement of reinforcement learning algorithms has significantly contributed to the enhancement of control strategies for PEMFCs. In [14], a multi-objective optimal fractional-order proportion integration differentiation controller is proposed for PEMFCs. In addition, a novel large-scale deep reinforcement learning is designed to ensure the optimal and comprehensive control performance. In [15], an evolutionary curriculum imitation large-scale deep reinforcement learning algorithm is proposed to improve the robustness of the control strategy of PEMFCs.

Therefore, a model-free reinforcement learning method is proposed for water management in PEMFCs. First, to address the challenge of continuous action spaces, actor-critic-based reinforcement learning is employed. Then, the integration of deep neural networks into reinforcement learning is achieved through the utilization of experience replay buffers and four neural networks, leading to the formulation of the Deep Deterministic Policy Gradient (DDPG). Finally, a prioritized experience replay mechanism is incorporated, resulting in the development of the prioritized DDPG method for water management.

The proposed model-free water management system enables adjustments to the hydrogen gas inlet pressure, thereby regulating water content within the PEMFC based on observed stack current and voltage. This adjustment ensures the efficient operation of PEMFCs. The primary contribution lies in the introduction of the prioritized DDPG method, providing an effective solution to the complex water management challenges encountered in PEMFCs.

The structure of this paper is organized as follow: Sect. 2 gives the preliminary theory for reinforcement learning. The water management for problem PEMFCs is formulated in Sect. 3. Section 4 describes the proposed prioritized DDPG water management method. The results of simulation are presented and analysed in Sect. 5. Finally, Sect. 6 concludes this paper.

2 Reinforcement learning

Reinforcement learning is a goal-driven leaning method, which could acquire knowledge through interaction with environment [16]. The learner is agent, and everything it interacts is environment. The agent continuously updates its knowledge through a process of trial-and-error learning while interacting with the environment.

At each discrete time step denoted as t, the agent selects its action, denoted as A_t, based on the observed state from the environment, referred to as S_t. The state S_t exhibits the Markov properties, indicating that future states depend only on the current state of the process and are independent of the past. The chosen action A_t leads to a transformation of the environment state S_t. In response to this change, the agent receives an immediate reward r_t from the environment.

The core of reinforcement learning is the policy π(A_t|S_t), which represents the action selection based on the states. The objective of the agent is to optimize the cumulative reward over an entire episode. At each time step within an episode, the agent aims to choose the optimal action from the set of possible actions based on its policy.

To determine the optimal policy for an agent, there are two fundamental challenges in reinforcement learning [17]. The first challenge is how to evaluate the long-term rewards of a sequence of actions. Long-term rewards remain unknown until the end of the episode. Furthermore, in the practical situation, environmental information is often partial or incomplete. Only experience is gained after engaging in the trial-and- error learning with the environment, consisting of a sequence of states, actions, and rewards. Consequently, extracting valuable information from experience is critical for reinforcement learning.

The second challenge is the dilemma between exploration and exploitation. If the agent only focuses on maximizing its current immediate reward, it might result in short-term gains. However if the agent only explores for potential future reward, it could miss out on maximizing immediate reward. Hence the balance between exploration and exploitation is also important for reinforcement learning.

3 Water management problem formulation

Proton exchange membrane fuel cells (PEMFCs) are a promising and environmentally friendly energy device that converting hydrogen gas and oxygen into electricity. A PEMFC includes anode, cathode, and proton exchange membrane (PEM) sandwiched between them. As shown in Fig. 1, hydrogen gas is introduced to the anode and dissociated into protons H⁺ and electrons e⁻. Simultaneously, oxygen is supplied to the cathode for reduction reaction, which involves the acceptance of electrons and the creation of oxygen ions. Protons pass through the PEM, while electrons are directed through an external circuit, creating an electric potential difference across the cell. At the cathode, protons and electrons combine with oxygen to produce water H₂O, releasing energy in the process. This reaction results in the formation of water as the only byproduct, making PEMFCs an environmentally friendly energy source.

Water management in PEMFCs involves regulating the water content to maintain their optimal operation. This regulation is necessary to ensure the right moisture levels in different fuel cell components, including PEM, electrodes, and gas diffusion layers.

Firstly, it is crucial to maintain adequate moisture levels in PEM and catalyst layers. This moisture facilitates proton movement and the chemical reactions between hydrogen and oxygen. Maintaining the correct humidity levels of reactant gases, such as hydrogen and oxygen, is essential for achieving the desired moisture level within the cell. Dry gases can lead to PEM dehydration, while excessively humid gases can result in flooding.

Secondly, water is produced as a byproduct of the chemical reactions. If this water accumulates and is not removed effectively, it can block gas flow through the porous components, hindering the transport of reactants to the catalysts. This can result in decreased performance and potential cell damage.

Therefore,water management is vital because both excessive and insufficient water can harm the fuel cell's performance.

In this paper, the water management problem in PEMFCs is modeled as a Markov decision process (s,a,T,r,γ). The environmental state is represented as s = (I,V), including the stack current of the PEMFCs I and the stack voltage V. Based on the observed environmental state, corresponding actions a can be taken to adjust the hydrogen gas inlet pressure, thereby altering the water content within the PEMFC. The state transition probability is 1 in this paper. The immediate reward that the agent receives after taking an action in a given state is denoted as r. Reward r is defined as the squared difference between the objective and the actual power output of Proton Exchange Membrane Fuel Cells (PEMFCs), serving as a metric for stability. The discount factor for future rewards is represented by γ.

The policy π is defined as the strategy to maximize the cumulative reward for a given state, and the return is the aggregation of discounted rewards from t = 1 to time t = T,

$$G(\pi ) = r_{1} + \gamma r_{2} + ... + \gamma^{T - 1} r_{T}$$

(1)

If discount factor γ = 0, the agent only focuses on immediate reward while disregarding future rewards. If discount factor γ = 1, the agent accords equal significance to both immediate and future rewards. In general, 0 < γ < 1.

In the next section, the model-free reinforcement learning algorithm is introduced in this paper to address the Markov decision problem constructed for water management in PEMFCs.

4 Water management strategy based on prioritized DDPG

4.1 Evaluating long-term rewards for optimal water management

The traditional Q-learning is introduced to address the two fundamental challenges in Sect. 2. To solve the first problem of evaluating long-term rewards for different actions, state-action value function is introduced to assess how good actions are given the observed state s.

To evaluate the long-term rewards, the action-value function Qπ (s,a) is designed for estimating how good the action a is in the given state s. Qπ (s,a) is defined as the expected long-term return for taking action a in given state s following policy π as follows,

$${\text{Q}}_{\pi } (s,a) = {\rm E}[G_{t} \left| {S_{t} = s,A_{t} = a} \right.]$$

(2)

Based on the Bellman optimality principle, it can be divided into immediate reward and discounted next-state rewards as,

$${\text{Q}}_{\pi } (s,a) = {\rm E}[R + \gamma {\text{Q}}_{\pi } (s^{\prime},a^{\prime})]$$

(3)

For water management in PEMFCs, it's hard to model the entire chemical reaction. Therefore, the expected long-term rewards from experience is used. At each time step, the agent observes states, which are the stack current and stack voltage states of the PEMFC. Based on the action-value function Qπ (s,a), it chooses the action for hydrogen gas inlet pressure. After executing the action a, the state of the PEMFC changes to s_t and receives an immediate reward r. The expected value function is as follows,

$${\text{Q}}_{E} (s,a) = R + \gamma \mathop {\max }\limits_{a} {\text{Q}}(s^{\prime},a)$$

(4)

The action-value function Eq. 3 is updated towards to the estimated return Eq. 4 by the following Eq. 5,

$${\text{Q}}(s,a) \leftarrow {\text{Q}}(s,a) + \alpha \left( {R + \gamma \mathop {\max }\limits_{a} {\text{Q}}(s^{\prime},a){\text{ - Q}}(s,a)} \right)$$

(5)

where α ∈ (0, 1) is the learning rate, which represents how quickly the agent learns from new ones. The action-value Q is updated constantly step by step. After a number of episodes, the Q derived from experience is approximate to the actual value.

4.2 The trade-off between exploration and exploitation for water management strategy

To solve the problem of the trade-off between exploration and exploitation, the ε-greedy algorithm is used in this paper. With the probability ε, the agent chooses an action at random. With the probability 1 − ε, the agent chooses an action based on the maximal action-value function. The map** from state s to action a is as follow,

$$\pi \left( {a|s} \right) = \left\{ {\begin{array}{*{20}l} {\frac{\varepsilon }{m} + 1 - \varepsilon ,\;if\;a^{*} = \mathop {\arg \max }\limits_{a \in A} Q(s,a)} \hfill \\ {\frac{\varepsilon }{m},\qquad \qquad otherwise} \hfill \\ \end{array} ,} \right.$$

(6)

where m is the number of all the actions. The ε-greedy algorithm is a near-greedy algorithm. The greedy action is chosen most of the time, which is the process of exploitation. The exploitation process is to maximize immediate reward based on the learned experience. But every once in a while, the action is chosen randomly, which is the process of exploration. The exploration process is responsible on gathering more information about the water management problem.

The choice of the exploration rate ε significantly influences the balance between exploration and exploitation. A high ε encourages more exploration, which can help discover better actions but may result in lower short-term rewards. Conversely, a low ε prioritizes exploitation, potentially leading to faster convergence to locally optimal actions but risking the possibility of missing globally optimal solutions. Therefore, the trade-off between exploration and exploitation is solved by ε -greedy.

4.3 The deep deterministic policy gradient algorithm

Traditional Q-learning effectively handles the two fundamental challenges in reinforcement learning as described earlier. However, it is hard to address the water management problem in PEMFCs. The first difficulty is that hydrogen gas inlet pressure action is continuous, while traditional Q-learning only can handle discrete action. The second difficulty is how to introduce deep neural networks for better nonlinear fitting of the water management problem in PEMFCs.

The deep deterministic policy gradient (DDPG) algorithm is introduced in this paper to address the aforementioned water management problem in PEMFCs. DDPG is a reinforcement learning method based on the Actor-Critic framework [18]. The Actor is responsible for obtain action based on state, and the Critic is used for evaluating the policy. Therefore, DDPG can handle continuous hydrogen gas inlet pressure action in water management.

To integrate deep neural networks into reinforcement learning, it is crucial to ensure that the training data adheres to the principle of being independently and identically distributed (i.i.d.). However, the data generated in water management for PEMFCs exhibit high correlation. To mitigate this correlation within the reinforcement learning data, we introduce an experience replay buffer for training deep neural networks. The samples acquired at each time step during the interaction with the environment are stored in the replay buffer. During the training of deep neural networks for water management, small batches of samples are subsequently extracted from this set.

To ensure the stability of the reinforcement learning training, four neural networks are stablished, which are online Actor network π_θ(s|a), offline Actor network π′_θ(s|a), online Critic network q_ϕ (s,a) and offline Critic network q′_ϕ(s,a). First, the parameters of the offline networks (θ′ and ϕ′) are fixed. Then, the online network (θ and ϕ) are trained using the offline network. After a certain period, the parameters of the current network are transferred to the offline network.

First, initialize the parameters of the four neural networks θ, θ′, ϕ, ϕ′, which consist of two Actor networks and two Critic networks, within the DDPG framework. At time step t, based on the observed stack current and stack voltage state s_t, select the corresponding hydrogen inlet pressure action at using the online Actor network π_θ (s|a). After executing the action, obtain the t + 1 state s_t+1 and immediate reward r_t. Store this information s_t, a_t, r_t, s_t+1 in the experience replay. After a certain period of step, randomly select M samples from the experience replay collection to update the networks.

For the jth sample s_j,a_j, r_j, s_j+1 in M samples, select an action a_j+1 for the state s_j+1 based on the offline Actor network π′_θ(s|a), and calculate the Q-value y_j for the state s_j+1 and action a_j+1 based on the offline Critic network q′_ϕ(s,a).

$$y_{j} = r_{j} + q_{\omega ^{\prime}} (s_{j + 1} ,\pi_{\theta ^{\prime}} (s_{j + 1} \left| a \right._{j + 1} ))$$

(7)

For mini-batch water management samples, calculate the mean squared error loss function as follows,

$$Loss\left( \omega \right) = \frac{1}{M}\sum\nolimits_{j} {\left( {y_{j} - q_{\omega } (s_{j} ,a_{j} )} \right)}^{2}$$

(8)

Update the parameters of the online Critic network,

$$\varphi \leftarrow \varphi + \alpha_{c} \nabla {\text{Loss}}\left( \varphi \right)$$

(9)

where α_c is the learning rate for online Critic network.

Compute the policy gradient for sampling,

$$\nabla J\left( \theta \right) = \frac{1}{M}\sum\nolimits_{j} {\nabla Q_{\varphi } \left( {s_{j} ,a_{j} } \right)\left| {_{{s = s_{j} ,a = \pi_{\theta } (s_{j} )}} } \right.} \nabla_{\theta } \pi_{\theta } (s)\left| {_{{s = s_{j} }} } \right.$$

(10)

Update the parameters of the online Actor network based on the policy gradient,

$$\theta \leftarrow \theta + \alpha_{a} \nabla J\left( \theta \right)$$

(11)

where α_a is the learning rate for online Actor network.

Once the current network reaches a certain update frequency, update the offline Actor and offline Critic networks,

$$\theta ^{\prime} \leftarrow \beta \theta + (1 - \beta )\theta ^{\prime}$$

(12)

$$\varphi ^{\prime} \leftarrow \beta \varphi + (1 - \beta )\varphi ^{\prime}$$

(13)

where β is the learning rate for offline network.

4.4 The prioritized deep deterministic policy gradient algorithm

However, experience replay involves storing past experiences in a replay buffer and then sampling mini-batches of experiences from this buffer for training deep neural networks. If the replay buffer contains lots of biased sample of experiences (e.g., a sub-optimal strategy or over-representation of certain state-action pairs), it can lead to suboptimal learning.

In standard experience replay, all experiences are treated equally, and samples are drawn uniformly from the replay buffer. To increase sample efficiency and faster learning, prioritized experience replay is introduced. In prioritized experience replay, each experience is assigned a priority value that reflects its potential significance for learning. The priority is determined based on the magnitude of the temporal difference error. Experiences with higher temporal difference errors are given higher priority because they represent situations where the agent's predictions are far from the target values.

Traditional prioritized experience replay are set based on the loss function. Experiences with larger errors are accorded higher priority, subsequently enhancing their likelihood of being selected for replay during the training process. However, this prioritizing method exhibits certain limitations. It can lead to the omission of certain samples that already had low errors initially. Moreover, it is overly sensitive to noise, as the negative effects introduced by some noisy samples gradually worsen during training. Relying only on the loss function for prioritization can result in a lack of diversity and insufficient robustness in water management strategies.

To enhance the efficacy of sampling within the replay buffer, we introduce a probabilistic sampling technique. This approach facilitates the prioritization of samples based on their relative importance, such that samples with elevated priority are more likely to be chosen during the training of scheduling policies, all the while upholding a non-zero sampling probability for the lowest-priority samples. Concretely, the priority,denoted as P_i, for a given sample i, is formally articulated as follows,

$$P_{i} = (|\nabla Loss(s,a,r,s^{\prime})| + p)^{\mu }$$

(13)

where the constant p is introduced to ensure that all samples have nonzero sampling probabilities, and µ represents the priority level.

In addition, during the training process of actual water management strategies, sorting all samples within the experience replay pool by priority before sampling each time results in low efficiency. Therefore, in this section, we employ a binary tree structure called SumTree to store the samples, enabling fast storage and retrieval of prioritized water management samples.

As shown in Fig. 2, this binary tree stores the sampling probabilities corresponding to eight water management samples. Different sampling probability values P determine the length of the intervals occupied by the samples. the longer the interval, the higher the priority of the corresponding water management sample. The parent node represents the sum of the two corresponding child nodes. The value of the root node encompasses all stored samples.

The steps of selecting samples from replay buffer are as follows. First, sample uniformly from the interval [0, P₁₅] to get the sampling probability P_k. Then, compare P_k with P₁₃ on the left, if P_k is smaller than P₁₃, then use P_k to continue down from the current node; otherwise, use P_k − P₁₃ from the right node. The comparison is continued with the nodes in the third layer until the sampling P_k is finally determined. By utilizing the binary tree structure of SumTree, the samples are sampled and updated more efficiently during the training process of the reinforcement learning of water management. The sampling and updating during the training process of reinforcement learning is more efficient with a sampling complexity of O (log N). Further, the range [0, P₁₅] is averaged into N_batch ranges, which are uniformly divided into N_batch ranges.

The random sampling in traditional deep Q-learning experience playback mechanisms is uniform, resulting in some samples that may never be played back or may take a long time to be played back. In this section, we design a prioritized sampling mechanism that assigns different sampling probabilities to different samples. If some samples are taken out repeatedly and corrected many times, a smaller error will result in a lower priority. Therefore, priority sampling is more inclined to take out new samples, which ensures the timeliness of sample sampling, improves the sample utilization, and is more suitable for solving the water management problem of PEMFCs.

In summary, the proposed water management algorithm based on prioritized DDPG for PEMFCs is as Fig. 3.

5 Experiment

The proposed DDPG-based PEMFCs water management algorithm is verified in this section. As shown in Fig. 4, the experimental platform consists of a PEMFC system, a hydrogen supply system, a M9716 programmable DC electronic load, a self- developed hydrogen fuel cell detection system, a computer with USB CAN driver, and the platform is operating under open-circuit voltage conditions. Figure 5 depicts the control interface of the hydrogen fuel cell system platform, which was designed using the Troowin FCCU software [19]. The software is used for collecting and exhibiting various data during the operation of the hydrogen fuel cell, the data acquisition interface is shown in Fig. 6.

In the dynamic operational context of external voltage fluctuations, the operation of PEMFCs involves systematic control of hydrogen inlet pressure and precise adjustments to the water injection level. The regulation of hydrogen inlet pressure can be achieved by modulating the magnitude of the inlet hydrogen flow, thereby enabling control over the reaction rate and adjusting the water injection level. This control is executed in response to the continuously observed state, specifically the stack current and stack voltage.

The primary objective of this control mechanism is to uphold the water balance within the hydrogen fuel cell, ensuring a consistently stable power output. For a more granular understanding, Fig. 7 provides detailed insights into the temporal changes in both the observed stack current and stack voltage. The two sub-figures serve as valuable indicators of the consequential shifts within the operational environment, and offers a comprehensive perspective on how the fuel cell system responds to the external dynamics, contributing to a more robust comprehension of the system's behavior.

The "Tianshou" platform is introduced for reinfocement learning. The "Tianshou" platform provides a fast-speed framework and pythonic API for building the deep reinforcement learning agent based on pure PyTorch. It is very fast thanks to numba jit function and vectorized numpy operation [20]. The proposed prioritized DDPG algorithm is implemented in Python 3.8. The experience replay buffer size D is set to 1000, the batch size M is set to 128, the reward discount factor is 0.99, the learning rate for the actor network is 0.0001, and the learning rate for the critic network is 0.001.

After numerous iterations of the learning process, the rewards exhibit a gradual convergence toward the optimal values, as illustrated in Fig. 8. This convergence serves as a testament to the successful implementation of intelligent water management. It highlights the algorithm's inherent capacity to dynamically adapt and optimize the water management strategy in response to the continuously evolving operational conditions within the hydrogen fuel cell system.

The observed stability in reward convergence not only underscores the effectiveness of the proposed control mechanism but also attests to its resilience in addressing the inherent dynamism and uncertainties pervasive in the operational environment.

The adaptability of the algorithm, evidenced by the consistent convergence of rewards, underscores its proficiency in navigating diverse scenarios while maintaining a stable and optimal water balance. This adaptability assumes paramount importance in real-world applications, where external factors, such as voltage fluctuations and environmental changes, wield considerable influence over the performance of PEMFCs. Therefore, the demonstrated stability in reward convergence provides robust evidence that the intelligent water management strategy is well-equipped to cope with the dynamic variations in the operational landscape, ensuring steadfast and reliable performance across a spectrum of conditions.

Figure 9 shows the power action and power output by the proposed water management method in PEMFC. The sustained stability of the output, averaging around 0.75W, serves as a tangible manifestation of the prioritized DDPG's adeptness in swiftly learning and adapting to the dynamic external voltage conditions. This stability not only signifies the successful convergence of the reinforcement learning process but also underscores the robustness of the prioritized DDPG approach in upholding a consistent and optimal water balance.

In the context of the prioritized DDPG water management strategy, as illustrated in Fig. 9, the dynamic adjustments of hydrogen inlet pressure values exhibit a close association with the observed state s. The incorporation of the prioritized DDPG algorithm introduces a refined prioritization mechanism within the experience replay, fostering heightened learning efficiency and accelerated convergence. This prioritization feature enables the model to selectively emphasize experiences with greater learning potential, thereby influencing the adaptation of hydrogen inlet pressure values.

The observed correlation between the prioritized DDPG's learning efficiency and the stability in the hydrogen fuel cell's power output reaffirms the method's efficacy in addressing the challenges posed by the dynamic operational environment. The prioritized DDPG not only optimizes water management but also demonstrates an enhanced capability to navigate through the intricacies of external voltage variations, thereby showcasing its potential for practical applications in real-world scenarios.

6 Conclusion

In response to the complex and dynamic external power voltage conditions in the working environment of hydrogen fuel cells, this paper presents a model-free water management method based on the prioritized DDPG reinforcement learning approach. This method operates within the Actor-Critic reinforcement learning framework, comprising two sets of neural networks within the Actor component. It incorporates an prioritized experience replay mechanism and uses observations of stack current and stack voltage to execute hydrogen inlet pressure actions. The Critic component also consists of two sets of neural networks to evaluate the quality of executed actions. Through continuous interaction with the environment and learning, the method adjusts the water content of the hydrogen fuel cell to achieve water balance.

The proposed prioritized DDPG-based hydrogen fuel cell water management methodology is implemented using a constructed experimental platform for hydrogen fuel cell water management and the “Tianshou” platform. Experimental results serve as empirical evidence, substantiating the effectiveness and validity of the proposed approach.

In addition, future research endeavors could concentrate on enhancing the prioritized experience replay mechanism to improve learning efficiency and expedite convergence rates. This may entail investigating alternative prioritization strategies or integrating adaptive mechanisms to dynamically adjust replay priorities according to the relevance of experiences.

Data availability

The author(s) are ready to provide data on request.

References

Jiao K, Xuan J, Du Q, Bao Z, **e B, Wang B, Guiver M. Designing the next generation of proton-exchange membrane fuel cells. Nature. 2021;595:361–9. https://doi.org/10.1038/s41586-021-03482-7.
Article ADS CAS PubMed Google Scholar
Wang X, Ma Y, Gao J, Li T, Jiang G, Sun Z. Review on water management methods for proton exchange membrane fuel cells. Int J Hydrogen Energy. 2021;46(22):12206–29. https://doi.org/10.1016/j.ijhydene.2020.06.211.
Article CAS Google Scholar
Baz F, Elzohary R, Osman S, Marzouk S, Ahmed M. A review of water management methods in proton exchange membrane fuel cells. Energy Convers Manage. 2024;302: 118150. https://doi.org/10.1016/j.enconman.2024.118150.
Article CAS Google Scholar
Ou K, Yuan W-W, Choi M, Yang S, Kim Y-B. Performance increase for an open-cathode PEM fuel cell with humidity and temperature control. Int J Hydrogen Energy. 2017;42(50):29852–62. https://doi.org/10.1016/j.ijhydene.2017.10.087.
Article CAS Google Scholar
Zhang G, Jiao K. Three-dimensional multi-phase simulation of PEMFC at high current density utilizing Eulerian-Eulerian model and two-fluid model. Energy Convers Manage. 2018;176:409–21. https://doi.org/10.1016/j.enconman.2018.09.031.
Article CAS Google Scholar
Chen X, Xu J, Liu Q, Chen Y, Wang X, Li W, Ding Y, Wan Z. Active disturbance rejection control strategy applied to cathode humidity control in PEMFC system. Energy Convers Manage. 2020;224: 113389. https://doi.org/10.1016/j.enconman.2020.113389.
Article CAS Google Scholar
Headley A, Yu V, Borduin R, Chen D, Li W. Development and experimental validation of a physics-based PEM fuel cell model for cathode humidity control design. IEEE/ASME Trans Mechatron. 2016;21(3):1775–82. https://doi.org/10.1109/TMECH.2015.2505712.
Article Google Scholar
Sun L, ** Y, You F. Active disturbance rejection temperature control of open-cathode proton exchange membrane fuel cell. Appl Energy. 2020;261: 114381. https://doi.org/10.1016/j.apenergy.2019.114381.
Article CAS Google Scholar
Chen X, Wang C, Xu J, Long S, Chai F, Li W, Song X, Wang X, Wan F. Membrane humidity control of proton exchange membrane fuel cell system using fractional-order PID strategy. Appl Energy. 2023;343: 121182. https://doi.org/10.1016/j.apenergy.2023.121182.
Article CAS Google Scholar
Nanadegani F, Lay E, Iranzo A, Salva J, Sunden B. On neural network modeling to maximize the power output of PEMFCs. Electrochim Acta. 2020;348: 136345. https://doi.org/10.1016/j.electacta.2020.136345.
Article CAS Google Scholar
Srinivasan S, Tiwari R, Krishnamoorthy M, Lalitha M, Raj K. Neural network based MPPT control with reconfigured quadratic boost converter for fuel cell application. Energy. 2021;46:6709–19. https://doi.org/10.1016/j.ijhydene.2020.11.121.
Article CAS Google Scholar
Pang Y, Hao L, Wang Y. Convolutional neural network analysis of radiography images for rapid water quantification in PEM fuel cell. Appl Energy. 2022;321: 119352. https://doi.org/10.1016/j.apenergy.2022.119352.
Article Google Scholar
Chen Q, Long R, Zhang L. Water management in proton exchange membrane fuel cell based on actor critic learning control. In: 2019 34rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), 250–254. 2019. https://doi.org/10.1109/YAC.2019.8787605.
Li J, Geng J, Yu T. Multi-objective optimal control for proton exchange membrane fuel cell via large-scale deep reinforcement learning. Energy Rep. 2021;7:6422–37. https://doi.org/10.1016/j.egyr.2021.07.067.
Article Google Scholar
Li J, Yu T. Large-scale multi-agent deep reinforcement learning-based coordination strategy for energy optimization and control of proton exchange membrane fuel cell. Sustain Energy Technol Assess. 2021;48: 101568. https://doi.org/10.1016/j.seta.2021.101568.
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. 2015. ar**v preprint ar**v:1509.02971, https://doi.org/10.48550/ar**v.1509.02971.
Cheng Y, Peng J, Gu X, Zhang X, Liu W, Yang Y, Huang Z. RLCP: a reinforcement learning method for health stage division using change points. In: 2018 IEEE International Conference on Prognostics and Health Management (ICPHM), 1–6, 2018. https://doi.org/10.1109/ICPHM.2018.8448499.
Sutton RS, Barto AG. Reinforcement learning: an introduction, pp. 1–552. MIT Press, 2018. ISBN: 9780262039246.
Troowin Company. http://www.troowin.com.
Weng J, Chen H, Yan D, You K, Duburcq A, Zhang M, Su Y, Su H, Zhu J. Tianshou: a highly modularized deep reinforcement learning library, 2022.

Download references

Acknowledgements

This work was supported by The Natural Science Foundation of Hunan Provincial (2022JJ90011) and The Scientific Research Foundation of Hunan Provincial Education Department (22B0738).

Funding

The Natural Science Foundation of Hunan Provincial, 2022JJ90011. The Scientific Research Foundation of Hunan Provincial Education Department, 22B0738.

Author information

Authors and Affiliations

Hunan Institute of Metrology and Test, Changsha, Hunan, China
De **ang, Qingxian Li & Liangjiang Liu
Hunan First Normal University, Changsha, Hunan, China
Yijun Cheng
Hunan Institute of Engineering, **angtan, Hunan, China
Qiong Wang

Authors

De **ang
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingxian Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liangjiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DX and YC drafted the primary manuscript, with contributions from QL, QW, and LL, who assisted with the experiments. All authors participated in the manuscript review process.

Corresponding author

Correspondence to Yijun Cheng.

Ethics declarations

Informed consent

The corresponding author affirms that all co-authors have provided their consent to participate in and publish this article.

Competing interests

All the authors declare that there is no conflict of interest regarding the publication of this manuscript. We have provided information regarding any sources of funding and financial or non-financial interests related to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

**ang, D., Cheng, Y., Li, Q. et al. Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells. Discov Appl Sci 6, 140 (2024). https://doi.org/10.1007/s42452-024-05789-2

Download citation

Received: 11 November 2023
Accepted: 05 March 2024
Published: 18 March 2024
DOI: https://doi.org/10.1007/s42452-024-05789-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells

Abstract

Article Highlights

Similar content being viewed by others

Optimal operation of reverse osmosis desalination process with deep reinforcement learning methods

Deep Reinforcement Learning Based Smart Water Heater Control for Reducing Electricity Consumption and Carbon Emission

Multi-objective deep reinforcement learning for emergency scheduling in a water distribution network

1 Introduction

2 Reinforcement learning

3 Water management problem formulation

4 Water management strategy based on prioritized DDPG

4.1 Evaluating long-term rewards for optimal water management

4.2 The trade-off between exploration and exploitation for water management strategy

4.3 The deep deterministic policy gradient algorithm

4.4 The prioritized deep deterministic policy gradient algorithm

5 Experiment

6 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Informed consent

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Water management scheme based on prioritized deep deterministic policy gradient for proton exchange membrane fuel cells

Abstract

Article Highlights

Similar content being viewed by others

Optimal operation of reverse osmosis desalination process with deep reinforcement learning methods

Deep Reinforcement Learning Based Smart Water Heater Control for Reducing Electricity Consumption and Carbon Emission

Multi-objective deep reinforcement learning for emergency scheduling in a water distribution network

1 Introduction

2 Reinforcement learning

3 Water management problem formulation

4 Water management strategy based on prioritized DDPG

4.1 Evaluating long-term rewards for optimal water management

4.2 The trade-off between exploration and exploitation for water management strategy

4.3 The deep deterministic policy gradient algorithm

4.4 The prioritized deep deterministic policy gradient algorithm

5 Experiment

6 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Informed consent

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation