Heterogeneity Aware Distributed Machine Learning at the Wireless Edge for Health IoT Applications: An EEG Data Case Study

Mohammad, Umair; Saeed, Fahad

doi:10.1007/978-3-031-57567-9_3

Umair Mohammad³ &
Fahad Saeed³

Part of the book series: Big and Integrated Artificial Intelligence ((BINARI,volume 2))

48 Accesses

Abstract

In this book chapter, we design and develop a mobile edge learning (MEL) framework that enables multiple end user devices or “learners” to cooperatively train a machine learning (ML) model in a wireless edge environment. We will focus on designing and develo** the heterogeneity aware synchronous (HA-Sync) approach with time constraints and extend the framework to consider dual-time and energy constraints. The proposed MEL framework will include the commonly known federated learning (FL) as well as parallelized learning (PL). After discussing the system model and a brief convergence proof for both FL and PL, we will formulate the problem as a quadratically constrained integer linear program (QCILP), relax it to a QCLP, and propose analytical solutions based on Lagrangian analysis, Karush-Kuhn-Tucker (KKT) conditions, and partial fraction expansion. For the problem with dual-time and energy constraints, we will propose solutions based on the suggest and improve (SAI) approach. Results based on achievable local updates, validation accuracy progression, and the optimization algorithm’s execution time will be used to demonstrate the superiority of the proposed HA-Sync compared to the heterogeneity unaware (HU) approaches. As an application focus, we will demonstrate a real-time medical event prediction by showing the applicability of personalized MEL for epileptic seizure detection and predictions – using EEG data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (Brazil)

eBook: USD 79.99; Price excludes VAT (Brazil)

Hardcover Book: USD 99.99; Price excludes VAT (Brazil)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This section is part of two papers: “Adaptive Task Allocation for Mobile Edge Learning” published in proceedings of the IEEE WCNCW 2019 [23] and “Dynamic Task Allocation for Mobile Edge Learning” published in IEEE Transactions on Mobile Computing.

References

B. Jovanovic, Internet of things statistics for 2023 – taking things apart Online (2023). https://dataprot.net/statistics/iot-statistics/
B. McMahan, D. Ramage, Federated learning: collaborative machine learning without centralized training data Online (2017). https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
Markets and markets, Edge computing in healthcare market — Revenue trends and growth drivers Online (2023). https://www.marketsandmarkets.com/Market-Reports/edge-computing-in-healthcare-market-133588379.html#::text=Theglobaledgecomputingin,26.1%25from2022to2028
Statista, Internet of Things – US — Statista market forecast Online (2023). https://www.statista.com/outlook/tmo/internet-of-things/united-states
Sciforce, Can Edge Analytics Become a Game Changer? – Sciforce – Medium Online (2019). https://medium.com/sciforce/can-edge-analytics-become-a-game-changer-9cc9395d2727
S. Samarakoon, M. Bennis, W. Saad, M. Debbah, Federated learning for ultra-reliable low-latency V2V communications, in 2018 IEEE Global Communications Conference, GLOBECOM 2018 – Proceedings (Institute of Electrical and Electronics Engineers Inc., Dubai, UAE, 2018). Online. https://ieeexplore.ieee.org/document/8647927
B. Hu, Y. Gao, L. Liu, H. Ma, Federated region-learning: an edge computing based framework for urban environment sensing, in 2018 IEEE Global Communications Conference, GLOBECOM 2018 – Proceedings. (Institute of Electrical and Electronics Engineers Inc., Dubai, UAE, 2018). Online. https://ieeexplore.ieee.org/document/8647649
J. Jeon, J. Kim, J. Huh, H. Kim, S. Cho, Overview of distributed federated learning: research issues, challenges, and biomedical applications, in 2019 International Conference on Information and Communication Technology Convergence (ICTC) (IEEE, Jeju Island, South Korea, 2019), pp. 1426–1427. Online. https://ieeexplore.ieee.org/document/8939954/
N. Rieke, J. Hancox, W. Li, F. Milletarì, H.R. Roth, S. Albarqouni, S. Bakas, M.N. Galtier, B.A. Landman, K. Maier-Hein, S. Ourselin, M. Sheller, R.M. Summers, A. Trask, D. Xu, M. Baust, M.J. Cardoso, The future of digital health with federated learning. NPJ Digital Med. 3(1), 1–7 (2020). Online. http://dx.doi.org/10.1038/s41746-020-00323-1
W. Yang, B. Lim, N.C. Luong, D.T. Hoang, Federated learning in mobile edge networks : a comprehensive survey. IEEE Commun. Surv. Tutorials (Early Access), 1–33 (2020). Online. https://ieeexplore.ieee.org/document/9060868
L. Bottou, O. Bousquet, The tradeoffs of large scale learning, in Advances in Neural Information Processing Systems, ed. by J.C. Platt, D. Koller, Y. Singer, S. Roweis. NIPS Foundation (http://books.nips.cc), vol. 20, (2008), pp. 161–168. Online. http://leon.bottou.org/papers/bottou-bousquet-2008
S. Wang, T. Tuor, T. Salonidis, K.K. Leung, C. Makaya, T. He, K. Chan, Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. Early Access, 1–1 (2019). Online. https://ieeexplore.ieee.org/document/8664630/
S. Teerapittayanon, B. McDanel, H.T. Kung, Distributed deep neural networks over the cloud, the edge and end devices, in Proceedings – International Conference on Distributed Computing Systems, pp. 328–339
Google Scholar
J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M.A. Ranzato, A. Senior, P. Tucker, K. Yang, A.Y. Ng, Large scale distributed deep networks, in Advances in Neural Information Processing Systems, vol. 25 (2012), pp. 1223–1231. Online. https://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks
S. Wang, T. Tuor, T. Salonidis, K.K. Leung, C. Makaya, T. He, K. Chan, When edge meets learning : adaptive control for resource-constrained distributed machine learning, in INFOCOM (2018). Online. https://ieeexplore.ieee.org/document/8486403
U.Y. Mohammad, S. Sorour, Multi-objective resource optimization for hierarchical mobile edge computing, in 2018 IEEE Global Communications Conference: Mobile and Wireless Networks (Globecom2018 MWN), Abu Dhabi, United Arab Emirates (2018), pp. 1–6. Online. https://ieeexplore.ieee.org/document/8648109
H.H. Yang, Z. Liu, T.Q.S. Quek, H.V. Poor, Scheduling policies for federated learning in wireless networks. IEEE Trans Commun 68(1), 317–333 (2019). Online. https://ieeexplore.ieee.org/document/8851249
Z. Yang, M. Chen, W. Saad, C.S. Hong, M. Shikh-Bahaei, Energy efficient federated learning over wireless communication networks. IEEE Trans Wirel Commun 20(3), 1935–1949 (2020). Online. https://ieeexplore.ieee.org/document/9264742
M. Chen, Z. Yang, W. Saad, C. Yin, H.V. Poor, S. Cui, A joint learning and communications framework for federated learning over wireless networks. IEEE Trans. Wirel. Commun. 20(1), 269–283 (2021). Online. https://ieeexplore.ieee.org/document/9210812, https://arxiv.org/abs/1909.07972
T. Tuor, S. Wang, T. Salonidis, B.J. Ko, K.K. Leung, Demo abstract: distributed machine learning at resource-limited edge nodes, in INFOCOM 2018 – IEEE Conference on Computer Communications Workshops (2018), pp. 1–2
Google Scholar
D. Conway-Jones, T. Tuor, S. Wang, K.K. Leung, Demonstration of federated learning in a resource-constrained networked environment, in 2019 IEEE International Conference on Smart Computing (SMARTCOMP) (2019). Online. https://ieeexplore.ieee.org/abstract/document/8784064
U. Mohammad, S. Sorour, Adaptive task allocation for asynchronous federated and parallelized mobile edge learning (2020) ar**v preprint, ar**v:1905.01656, Online. https://arxiv.org/abs/1905.01656
U. Mohammad, S. Sorour, Adaptive task allocation for mobile edge learning, in 2019 IEEE Wireless Communications and Networking Conference Workshop (WCNCW) (IEEE, 2019), pp. 1–6. Online. https://ieeexplore.ieee.org/document/8902527/
A.D. Pia, S.S. Dey, M. Molinaro, Mixed-integer quadratic programming is in NP. Math. Program. 162(1), 225–240 (2017)
Article MathSciNet Google Scholar
J. Currie, D.I. Wilson, OPTI: lowering the barrier between open source optimizers and the industrial MATLAB user, in Foundations of Computer-Aided Process Operations, ed. by N. Sahinidis, J. Pinto (Savannah, Georgia, USA, 2012)
Google Scholar
J. Park, S. Boyd, General heuristics for nonconvex quadratically constrained quadratic programming (2017) ar**v e-prints. Online. http://arxiv.org/abs/1703.07870
M. Chen, H.V. Poor, W. Saad, S. Cui, Convergence time minimization of federated learning over wireless networks, in IEEE International Conference on Communications. (Institute of Electrical and Electronics Engineers Inc., 2020), pp. 1–6. Online. https://ieeexplore.ieee.org/document/9148815
M. Chen, H.V. Poor, W. Saad, S. Cui, Convergence time optimization for federated learning over wireless networks. IEEE Trans Wirel Commun. 20(4), 2457–2471 (2021). Online. https://ieeexplore.ieee.org/document/9148815
A. Abutuleb, S. Sorour, H.S. Hassanein, Joint task and resource allocation for mobile edge learning, in 2020 IEEE Global Communications Conference, GLOBECOM 2020 – Proceedings. (Institute of Electrical and Electronics Engineers Inc., 2020), pp. 1–6. Online. https://ieeexplore.ieee.org/document/9322399
S. Cebula, A. Ahmad, J.M. Graham, C.V. Hinds, L.A. Wahsheh, A.T. Williams, S.J. DeLoatch, Empirical channel model for 2.4 GHz IEEE 802.11 WLAN, in Proceedings of the 2011 International Conference on Wireless Networks (2011)
Google Scholar
S. Munder, D.M. Gavrila, An experimental study on pedestrian classification. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1863–1868 (2006). Online. https://ieeexplore.ieee.org/document/1704841
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). Online. https://ieeexplore.ieee.org/document/726791
B. Shillingford, What is the time complexity of backpropagation algorithm for training artificial neural networks? – Quora (2016). Online. https://www.quora.com/What-is-the-time-complexity-of-backpropagation-algorithm-for-training-artificial-neural-networks
F. Uhlig, The DQR algorithm, basic theory, convergence, and conditional stability. Numer. Math. 76(4), 515–553 (1997)
Article MathSciNet Google Scholar
A. Nemirovski, Interior point polynomial time methods in convex programming. Lect. Notes 42(16), 3215–3224 (2004). Online. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.6909&rep=rep1&type=pdf
Google Scholar
Y.A. Qadri, A. Nauman, Y.B. Zikria, A.V. Vasilakos, S.W. Kim, The future of healthcare internet of things: a survey of emerging technologies. IEEE Commun. Surv. Tutorials 22(2), 1121–1167 (2020). Online. https://ieeexplore.ieee.org/document/8993839
J.M. Raja, C. Elsakr, S. Roman, B. Cave, I. Pour-Ghaz, A. Nanda, M. Maturana, R.N. Khouzam, Apple watch, wearables, and heart rhythm: where do we stand?. Ann. Transl. Med. 7(17), 417–417 (2019). Online. http://www.pmc/articles/PMC6787392/?report=abstract, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6787392/
S. Kiranyaz, T. Ince, M. Gabbouj, Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Trans. Biomed. Eng. 63(3), 664–675 (2016)
Article Google Scholar
O. Choudhury, A. Gkoulalas-Divanis, T. Salonidis, I. Sylla, Y. Park, G. Hsu, A. Das, Anonymizing data for privacy-preserving federated learning. Online. https://arxiv.org/abs/2002.09096

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. TI-2213951. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL, USA
Umair Mohammad & Fahad Saeed

Authors

Umair Mohammad
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Saeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahad Saeed .

Editor information

Editors and Affiliations

Florida International University, Miami, FL, USA
M. Hadi Amini

Appendices

Appendix 1

The global model’ loss is denoted by $F(\mathbf {w})$. Let us assume that this function has the following three properties:

1.
$F(\mathbf {w})$ is convex.
2.
$F(\mathbf {w})$ is $\rho $-Lipschitz: $\lVert F(\mathbf {w})-F(\bar {\mathbf {w}}) \rvert \leq \rho \lvert \mathbf {w}-\bar {\mathbf {w}} \rvert $
3.
$F(\mathbf {w})$ is $\beta $-smooth: $ \lVert \nabla F(\mathbf {w})-\nabla F(\bar {\mathbf {w}}) \rvert \leq \beta \lvert \mathbf {w}-\bar {\mathbf {w}} \rvert $

Each learner $k \in \mathcal {K}$ in HA-Sync performs a total of L updates. Then, the difference between the loss at update L and the optimal global model denoted by ${\mathbf {w}}^*$ is bounded by:

$$\displaystyle \begin{aligned} F(\mathbf{w}[L]-F({\mathbf{w}}^*) \leq \dfrac{1}{G\tau \left[A+B(1-C) \right]} \end{aligned} $$

(3.43)

The learning rate is given by $\eta $ and we can define a control parameter $\phi = 1-\frac {\eta \beta }{2}$. The local losses are bound by the parameter $\epsilon $ whereas the function $h(\tau ) = \frac {\delta }{\beta }[(\eta \beta + 1)^\tau -1]-\eta \delta \tau $. For more details on $\delta $ and $\epsilon $, the reader is referred to [12]. In general, $\eta $ is selected such that $0<\eta \beta < 1$, $\eta \phi - \frac {\rho h(\tau )}{\tau \epsilon ^2} \geq 0$, and $(\eta \beta + 1)^\tau \geq \eta \beta \tau +1$. Consider the case where MEL is not optimized and each global cycle allows each learner $k \in \mathcal {K}$ to perform the same integer number of local updates $\tau $. Then, in a fixed number of global updates G, MEL will allow for a total of $L = G\tau $ updates. Let us define the constants $A = \eta \phi + \frac {\rho \delta }{\epsilon ^2}$, $B = \frac {\rho \delta }{\beta \epsilon ^2}$, and $C = \eta \beta +1$. Based on these definitions and assumptions, we can define the upper bound on the loss as follows:

$$\displaystyle \begin{aligned} F(\mathbf{w}[L]-F({\mathbf{w}}^*)) \leq \dfrac{1}{G\tau \left[A+B(1-C) \right]} {} \end{aligned} $$

(3.44)

It is observable that $A+B(1-C) \geq 0$, and further, the number of global updates G are fixed. Hence, the bound on the loss will converge to zero as $\tau \rightarrow \infty $.

Appendix 2

Let us write the KKT optimality conditions for (3.25) as shown in (3.45) and (3.51). The conditions (3.45) ensures that dataset size of any learner $k \in \mathcal {K}$ satisfies (3.27) and (3.47) ensures that the bound in (3.27) holds with equality for any learner $k \in \mathcal {K}$ if $\lambda _k^* \geq 0$. This is significant because strong duality holds for some feasible $\tau ^*$ when strictly speaking, $\lambda _k^* > 0 ~\forall ~k \in \mathcal {K}$. This means the upper bound will be the optimal solution.

$$\displaystyle \begin{gathered} C_k^2 \tau^* d_k^* + C_k^1 d_k^* + C_k^0 - T \leq 0, \quad k \in \mathcal{K} {} \end{gathered} $$

(3.45)

$$\displaystyle \begin{gathered} \alpha_0^*, \alpha_k^*, ~ \text{and} ~ \lambda_k^* \geq 0 \quad k \in \mathcal{K} \end{gathered} $$

(3.46)

$$\displaystyle \begin{gathered} \lambda_k^*\left(C_k^2 \tau^* d_k^* + C_k^1 d_k^* + C_k^0 - T\right) = 0, \quad k \in \mathcal{K} {} \end{gathered} $$

(3.47)

$$\displaystyle \begin{gathered} -\alpha_0^* \tau^* = 0 {} \end{gathered} $$

(3.48)

$$\displaystyle \begin{gathered} -\alpha_k^* d_k^* = 0 \quad k \in \mathcal{K} {} \end{gathered} $$

(3.49)

$$\displaystyle \begin{gathered} \sum_{k = 1}^{K}d_k^* - d = 0 {} \end{gathered} $$

(3.50)

$$\displaystyle \begin{aligned} -\nabla\tau^* + \sum_{k = 1}^{K} \lambda_k^*\nabla\left(C_k^2 \tau^* d_k^* + C_k^1 d_k^* + C_k^0 - T\right) + \\ \nu^*\nabla\left(\sum_{k = 1}^{K}d_k^* - d\right) - \alpha_0^*\nabla\tau^* - \nabla\left(\sum_{k = 1}^{K}\alpha_k^*d_k^*\right) = 0 {} \end{aligned} $$

(3.51)

We can then rewrite the bound on $d_k^*$ in (3.27) as an equality and substitute it back in (3.50) to obtain the following relation:

$$\displaystyle \begin{aligned} d= \sum_{k=1}^{K} d_k^* = \sum_{k=1}^{K} \left[ \dfrac{T-C_k^0}{\tau^* C_k^2+C_k^1} \right] = \sum_{k=1}^{K} \left[ \dfrac{a_k}{\tau^* + b_k} \right] {} \end{aligned} $$

(3.52)

The expression on the rightmost hand-side has the form of a partial fraction expansion of a rational polynomial function of $\tau ^*$ where $a_k, b_k \in \mathcal {R}^{++}$. Therefore, we can expand (3.52) to the form shown in (3.53).

$$\displaystyle \begin{aligned} \dfrac{a_1}{\tau^* + b_1} + \dfrac{a_2}{\tau^* + b_2} + \dots + \dfrac{a_k}{\tau^* + b_k} + \dots + \dfrac{a_K}{\tau^* + b_K} = \\ \dfrac{1}{(\tau^* + b_1)(\tau^* + b_2)\dots(\tau^* + b_k)\dots(\tau^* + b_K)} \times \\ \Bigg[ a_1(\tau^* + b_2)(\tau^* + b_3)\dots(\tau^* + b_k)\dots(\tau^* + b_K) + ~ \\ a_2(\tau^* + b_1)(\tau^* + b_3)\dots(\tau^* + b_k)\dots(\tau^* + b_K) + \ldots + \\ a_k(\tau^* + b_1)(\tau^* + b_2)\dots(\tau^* + b_{k-1})(\tau^* + b_{k+1})\dots(\tau^* + b_K) \\ + \dots + a_K(\tau^* + b_1)(\tau^* + b_2)\dots(\tau^* + b_k)\dots(\tau^* + b_{K-1}) \Bigg] {} \end{aligned} $$

(3.53)

Finally, the expanded form can be cleaned up in the form of a rational function with respect to $\tau ^*$, which is equal to the total dataset size d as shown in (3.54). Please note that the degrees of the numerator and denominator will be $K-1$ and K, respectively. Furthermore, the poles of the system will be $-b_k$, and since $b_k \geq 0$, the system will be stable. Furthermore, $\tau ^* = -b_k$ is not a feasible solution for the problem because it is eliminated by the $\tau \geq 0$ constraint. Therefore, we can rewrite (3.54) as shown in (3.28). By solving this polynomial, we obtain a set of solutions for $\tau ^*$, where one of them is feasible. The problem being non-convex, this feasible solution $\tau ^*$ will constitute the upper bound to the solution of the relaxed problem.

$$\displaystyle \begin{aligned} d = \dfrac{\sum_{k=1}^{K} a_k \prod_{\substack{l=1 \\ l\neq k}}^{K} \left(\tau^*+b_l\right)}{\prod_{k=1}^{K} \left(\tau^*+b_k\right)} {} \end{aligned} $$

(3.54)

As a last step, to ensure that the solution set is feasible, it must be noted that according to (3.48) and (3.49), $\alpha _0^*$ and $\alpha _k^* ~ \forall ~ k$ must be equal to 0. Expanding the vanishing gradient condition in (3.51), it can be shown that the following two relations can be derived (representing $K+1$ equations):

$$\displaystyle \begin{aligned} \lambda_k^*C_k^2\tau^* + \lambda_k^*C_k^1 + \nu^* = \alpha_k^*, ~ k \in \mathcal{K} {} \end{aligned} $$

(3.55)

$$\displaystyle \begin{aligned} -1+\sum_{k=1}^{K} \lambda_k^*C_k^2d_k^* = \alpha_0^* {} \end{aligned} $$

(3.56)

By setting $\alpha _0^* = 0$ and $\alpha _k^* = 0$ for $k \in \mathcal {K}$, we can write $\lambda _k^*$ in terms of $\nu ^*$ as shown in (3.57) and substitute the resulting expression in (3.56) to find $\nu ^*$ using the values of $d_k^*$ and $\tau ^*$ obtained from (3.27) and (3.28), respectively.

$$\displaystyle \begin{aligned} \lambda_k^* = -\dfrac{\nu^*}{C_k^2\tau^* + C_k^1}, ~ k \in \mathcal{K} {} \end{aligned} $$

(3.57)

$$\displaystyle \begin{aligned} \nu^* = -\dfrac{1}{\sum_{k=1}^{K} \frac{C_k^2d_k^*}{C_k^2\tau^* + C_k^1}} {} \end{aligned} $$

(3.58)

The values of $\lambda _k^*$ for $k \in \mathcal {K}$ can be obtained by back-substitution of $\nu ^*$ in (3.57). As one can observe, as long as there exists a $\tau ^*$ greater than zero, $\nu ^*$ will be negative and hence, $\lambda _k^*$ for $k \in \mathcal {K}$ will be strictly greater than zero. Hence, as long as there exists a $\tau ^* > 0$ in the feasible set such that $d_k^* > 0$, there will exist a set of $\lambda _k^* > 0$ for $k \in \mathcal {K}$. This fact can be used to verify the feasibility of the solution. This step is also helpful when there may exist multiple values of $\tau $ greater than zero for choosing the optimal $\tau ^*$. Extensive simulations presented in Sect. 3.5 demonstrated that there was no optimality gap between the analytical upper bounds and the numerical solution.

Appendix 3

Recall that the optimization variables are given by $\mathbf {x} = \left [\tau ~ d_1 ~ d_2 ~ \ldots ~ d_k ~ \ldots ~ d_K\right ]^T$. Then, the relaxed problem in (3.25) can be rewritten in the standard form of a QCQP as follows:

(3.59)

$$\displaystyle \begin{aligned} \text{s.t. }\qquad & {\mathbf{x}}^T {\mathbf{P}}_k \mathbf{x} +{\mathbf{p}}_k^T \mathbf{x} + p_k^0 \leq 0, \quad \forall k \in \mathcal{K} {} \end{aligned} $$

(3.59a)

$$\displaystyle \begin{aligned} & {\mathbf{x}}^T {\mathbf{Q}}_k \mathbf{x} +{\mathbf{q}}_k^T \mathbf{x} + q_k^0 \leq 0, \quad \forall k \in \mathcal{K} {} \end{aligned} $$

(3.59b)

$$\displaystyle \begin{aligned} & {\mathbf{x}}^T \mathbf{A} \mathbf{x} +{\mathbf{a}}^T \mathbf{x} + a_0 \leq 0 {} \end{aligned} $$

(3.59c)

$$\displaystyle \begin{aligned} & {\mathbf{x}}^T \bar{\mathbf{A}} \mathbf{x} +\bar{\mathbf{a}}^T \mathbf{x} + \bar{a}_0 \leq 0 {} \end{aligned} $$

(3.59d)

$$\displaystyle \begin{aligned} & {\mathbf{x}}^T \mathbf{U} \mathbf{x} +{\mathbf{U}}^T \mathbf{x} + u_0 \leq 0 {} \end{aligned} $$

(3.59e)

$$\displaystyle \begin{aligned} & {\mathbf{x}}^T {\mathbf{V}}_k \mathbf{x} +{\mathbf{v}}_k^T \mathbf{x} + v_k^0 \leq 0, \quad \forall k \in \mathcal{K} {} \end{aligned} $$

(3.59f)

The time and energy constraints are defined by (3.59a) and (3.59b), respectively. Constraints (3.59c) and (3.59d) are two inequality constraints used to simplify the equality constraint of total dataset size allocation. The nonnegative constraints on $\tau $ and $d_k$ are given in (3.59e) and (3.59f), respectively. The expressions for problem definition and each constraint have three terms each: a quadratic term, a linear term, and a constant term. The constant terms$p_k^0 = C_k^0-T$ and $q_k^0 = G_k^0-E_k^0 ~\forall ~ k \in \mathcal {K}$ and are associated with the time and energy constraints, respectively. The remaining constant terms $a_0 = -d$, $\bar {a}_o = d$ and $v_k^0 = d_l, \forall ~ k$ whereas $u_0 = 0$ and $f_0 =0$. In contrast, the coefficients associated with the linear terms in the objective ($\mathbf {f}$) and constraints ($~{\mathbf {p}}_k$, ${\mathbf {q}}_k$, $\mathbf {a}$, $\bar {\mathbf {a}}$, $\mathbf {u}$, and ${\mathbf {v}}_k$) can be represented by the following set of vectors:

$$\displaystyle \begin{aligned} &\quad \mathbf{f} = \left[-1 ~ 0 ~ 0 ~ \ldots ~ C_k^1 ~ \ldots ~ 0\right]^T \\ &\quad {\mathbf{p}}_k = \left[0 ~ 0 ~ 0 ~ \ldots ~ C_k^1 ~ \ldots ~ 0\right]^T, \forall ~ k\\ &\quad {\mathbf{q}}_k = \left[0 ~ 0 ~ 0 ~ \ldots ~ G_k^1 ~ \ldots ~ 0\right]^T, \forall ~ k\\ &\quad \mathbf{a} = \left[0 ~ 1 ~ 1 ~ \ldots ~ 1 ~ \ldots ~ 1\right]^T\\ &\quad \bar{\mathbf{a}} = \left[0 ~ -1 ~ -1 ~ \ldots ~ -1 ~ \ldots ~ -1\right]^T\\ &\quad \mathbf{u} = \left[-1 ~ 0 ~ 0 ~ \ldots ~ 0 ~ \ldots ~ 0\right]^T\\\ &\quad {\mathbf{v}}_k = \left[0 ~ 0 ~ 0 ~ \ldots ~ -1 ~ \ldots ~ 0\right]^T, \forall ~ k \end{aligned} $$

(3.60)

In general, the coefficients associated with the quadratic terms are $(K+1) \times (K+1)$ matrices. Because this is a QCLP, the quadratic term in the objective is ${\mathbf {0}}_{(K+1) \times (K+1)}$, a $(K+1) \times (K+1)$ zero matrix. The coefficients associated with the time and energy constraints, ${\mathbf {P}}_k$ and ${\mathbf {Q}}_k$, respectively, can be described as follows:

$$\displaystyle \begin{aligned} {\mathbf{P}}_k(i,j) = \begin{cases} 0.5C_k^2, \mbox{ if} & \begin{matrix} i = 1 ~ \& ~ j = k+1 \\ i = k+1 ~ \& ~ j = 1 \end{matrix}\\ 0, & \mbox{otherwise} \end{cases} {} \end{aligned} $$

(3.61)

$$\displaystyle \begin{aligned} {\mathbf{Q}}_k(i,j) = \begin{cases} 0.5G_k^2, \mbox{ if} & \begin{matrix} i = 1 ~ \& ~ j = k+1 \\ i = k+1 ~ \& ~ j = 1 \end{matrix}\\ 0, & \mbox{otherwise} \end{cases} {} \end{aligned} $$

(3.62)

The quadratic coefficients of the remaining constraints $\mathbf {A}$, $\bar {\mathbf {A}}$, $\mathbf {U}$ and ${\mathbf {V}}_k$ are all ${\mathbf {0}}_{(K+1) \times (K+1)}$. We can now define the functions ${\mathbf {F}}^2(\boldsymbol {\Gamma })$, ${\mathbf {f}}^1(\boldsymbol {\Gamma })$ and $f_0(\boldsymbol {\Gamma })$ as [26]:

$$\displaystyle \begin{aligned} {\mathbf{F}}^2(\boldsymbol{\Gamma}) = \sum_{k = 1}^{K} \lambda_k {\mathbf{P}}_k + \gamma_k {\mathbf{Q}}_k {} \end{aligned} $$

(3.63)

$$\displaystyle \begin{aligned} {\mathbf{f}}^1(\boldsymbol{\Gamma}) = \sum_{k = 1}^{K}\left( \lambda_k {\mathbf{p}}_k + \gamma_k {\mathbf{q}}_k + \nu_k {\mathbf{v}}_k\right) + \alpha \mathbf{a} + \bar{\alpha} \bar{\mathbf{a}} + \omega \mathbf{u} {} \end{aligned} $$

(3.64)

$$\displaystyle \begin{aligned} f_0(\boldsymbol{\Gamma}) = \sum_{k = 1}^{K}\left( \lambda_k p_k^0 + \gamma_k q_k^0 + \nu_k v_k^0 \right) + \alpha a_0 + \bar{\alpha} \bar{a}_0 {} \end{aligned} $$

(3.65)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mohammad, U., Saeed, F. (2024). Heterogeneity Aware Distributed Machine Learning at the Wireless Edge for Health IoT Applications: An EEG Data Case Study. In: Amini, M.H. (eds) Distributed Machine Learning and Computing. Big and Integrated Artificial Intelligence, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-031-57567-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-57567-9_3
Published: 01 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57566-2
Online ISBN: 978-3-031-57567-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics