Federated Learning over Harmonized Data Silos

Stripelis, Dimitris; Ambite, José Luis

doi:10.1007/978-3-031-36938-4_3

Dimitris Stripelis⁵ &
José Luis Ambite⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1106))

Included in the following conference series:

International Workshop on Health Intelligence

233 Accesses
2 Citations

Abstract

Federated Learning is a distributed machine learning approach that enables geographically distributed data silos to collaboratively learn a joint machine learning model without sharing data. Most of the existing work operates on unstructured data, such as images or text, or on structured data assumed to be consistent across the different silos. However, silos often have different schemata, data formats, data values, and access patterns. The field of data integration has developed many methods to address these challenges, including techniques for data exchange and query rewriting using declarative schema map**s, and entity linkage. We propose an architectural vision for an end-to-end Federated Learning and Integration system, incorporating the critical steps of data harmonization and data imputation, to spur further research on the intersection of data management information systems and machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Introduction to Federated Learning

An Introduction to Federated and Transfer Learning

An Introduction to Federated Learning: Working, Types, Benefits and Limitations

References

M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep learning with differential privacy, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), pp. 308–318
Google Scholar
J.L. Ambite, M. Tallis, K.I. Alpert, D.B. Keator, M.D. King, D. Landis, G. Konstantinidis, V.D. Calhoun, S.G. Potkin, J.A. Turner, L. Wang, Schizconnect: virtual data integration in neuroimaging, in Proceedings of the 11th International Conference on Data Integration in the Life Sciences (DILS 2015), Los Angeles, CA (2015), pp. 37–51
Google Scholar
O.F. Ayilara, L. Zhang, T.T. Sajobi, R. Sawatzky, E. Bohm, L.M. Lix, Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual. Life Outcomes 17(1), 1–9 (2019)
Article Google Scholar
P. Bellavista, L. Foschini, A. Mora, Decentralised learning in federated deployment environments: a system-level survey. ACM Comput. Surv. (CSUR) 54(1), 1–38 (2021)
Article Google Scholar
D. Bertsimas, C. Pawlowski, Y.D. Zhuo, From predictive methods to missing data imputation: an optimization approach. J. Mach. Learn. Res. 18(1), 7133–7171 (2017)
MathSciNet MATH Google Scholar
D.J. Beutel, T. Topal, A. Mathur, X. Qiu, T. Parcollet, P.P. de Gusmão, N.D. Lane, Flower: a friendly federated learning research framework (2020). ar**v:2007.14390
K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H.B. McMahan et al., Towards federated learning at scale: system design (2019). ar**v:1902.01046
K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical secure aggregation for federated learning on user-held data (2016). ar**v:1611.04482
S. Caldas, S.M.K. Duddu, P. Wu, T. Li, J. Konečnỳ, H.B. McMahan, V. Smith, A. Talwalkar, Leaf: a benchmark for federated settings (2018). ar**v:1812.01097
D. Cha, M. Sung, Y.R. Park, Implementing vertical federated learning using autoencoders: practical application, generalizability, and utility study. JMIR Med. Inf. 9(6) (2021). DOI:https://doi.org/10.2196/26598
R.J. Cruz-Correia, P.M. Vieira-Marques, A.M. Ferreira, F.C. Almeida, J.C. Wyatt, A.M. Costa-Pereira, Reviewing the integration of patient data: how systems are evolving in practice to meet patient needs. BMC Med. Inf. Decis. Making 7(1), 1–11 (2007)
Google Scholar
A. Doan, A. Halevy, Z. Ives, Principles of Data Integration (Morgan Kauffman, 2012)
Google Scholar
X.L. Dong, D. Srivastava, Big Data Integration. Synthesis Lectures on Data Management (Morgan & Claypool Publishers, 2015). https://doi.org/10.2200/S00578ED1V01Y201404DTM040
R. Fagin, P.G. Kolaitis, R.J. Miller, L. Popa, Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005). https://doi.org/10.1016/j.tcs.2004.10.033
Article MathSciNet MATH Google Scholar
I.P. Felligi, A.B. Sunter, A theory for record linkage. J. Amer. Stat. Assoc. 64(328), 1183–1210 (1969)
Article Google Scholar
T. Ghai, Y. Yao, S. Ravi, P. Szekely, Evaluating the feasibility of a provably secure privacy-preserving entity resolution adaptation of ppjoin using homomorphic encryption (2022). https://doi.org/10.48550/ARXIV.2208.07999. https://arxiv.org/abs/2208.07999
G. Gottlob, T. Lukasiewicz, A. Pieris, Datalog+/-: questions and answers, in 14th International Conference on Principles of Knowledge Representation and Reasoning KR (2014)
Google Scholar
U. Gupta, D. Stripelis, P.K. Lam, P. Thompson, J.L. Ambite, G. Ver Steeg, Membership inference attacks on deep regression models for neuroimaging, in Medical Imaging with Deep Learning (PMLR, 2021), pp. 228–251
Google Scholar
A.Y. Halevy, Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)
Article MATH Google Scholar
S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, B. Thorne, Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption (2017)
Google Scholar
D. Heimbigner, D. McLeod, A federated architecture for information management. ACM Trans. Inf. Syst. (TOIS) 3(3), 253–278 (1985)
Article Google Scholar
R. Jain, Out-of-the-box data engineering events in heterogeneous data environments, in Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405) (IEEE, 2003), pp. 8–21
Google Scholar
P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., Advances and open problems in federated learning (2019). ar**v:1912.04977
G.A. Kaissis, M.R. Makowski, D. Rückert, R.F. Braren, Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2(6), 305–311 (2020)
Article Google Scholar
C.A. Knoblock, P. Szekely, J.L. Ambite, A. Goel, S. Gupta, K. Lerman, M. Muslea, M. Taheriyan, P. Mallick, Semi-automatically map** structured sources into the semantic web, in Proceedings of the Extended Semantic Web Conference, Crete, Greece (2012)
Google Scholar
T. Köse, S. Özgür, E. Coşgun, A. Keskinoğlu, P. Keskinoğlu, Effect of missing data imputation on deep learning prediction performance for vesicoureteral reflux and recurrent urinary tract infection clinical study. BioMed Res. Int. (2020)
Google Scholar
Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, B. He, A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar
T. Li, A.K. Sahu, A. Talwalkar, V. Smith, Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020)
Article Google Scholar
T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated optimization in heterogeneous networks (2018). ar**v:1812.06127
G. Liang, S.S. Chawathe, Privacy-preserving inter-database operations, in Intelligence and Security Informatics, ed. by H. Chen, R. Moore, D.D. Zeng, J. Leavitt (Springer, Berlin, Heidelberg, 2004), pp.66–82
Google Scholar
W.Y.B. Lim, N.C. Luong, D.T. Hoang, Y. Jiao, Y.C. Liang, Q. Yang, D. Niyato, C. Miao, Federated learning in mobile edge networks: a comprehensive survey. IEEE Commun. Surv. & Tutor. 22(3), 2031–2063 (2020)
Article Google Scholar
Y. Liu, A. Huang, Y. Luo, H. Huang, Y. Liu, Y. Chen, L. Feng, T. Chen, H. Yu, Q. Yang, Fedvision: an online visual object detection platform powered by federated learning, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34 (2020), pp. 13172–13179
Google Scholar
B. Louie, P. Mork, F. Martin-Sanchez, A. Halevy, P. Tarczy-Hornoch, Data integration and genomic medicine. J. Biomed. Inf. 40(1), 5–16 (2007)
Article Google Scholar
J. Ma, S.A. Naas, S. Sigg, X. Lyu, Privacy-preserving federated learning based on multi-key homomorphic encryption (2021). ar**v:2104.06824
B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in Artificial Intelligence and Statistics (PMLR, 2017), pp. 1273–1282
Google Scholar
F. Naumann, M. Herschel, An Introduction to Duplicate Detection. Synthesis Lectures on Data Management. (Morgan & Claypool Publishers, 2010)
Google Scholar
D. Ramage, S. Mazzocchi, Federated analytics: collaborative data science without data collection (2020). https://ai.googleblog.com/2020/05/federated-analytics-collaborative-data.html
S.J. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečnỳ, S. Kumar, H.B. McMahan, Adaptive federated optimization, in International Conference on Learning Representations (2020)
Google Scholar
N. Rieke, J. Hancox, W. Li, F. Milletari, H. Roth, S. Albarqouni, S. Bakas, M.N. Galtier, B. Landman, K. Maier-Hein et al., The future of digital health with federated learning. npj Digital Med. 3(119) (2020)
Google Scholar
R.L. Rivest, L. Adleman, M.L. Dertouzos et al., On data banks and privacy homomorphisms. Found. Secure Comput. 4(11), 169–180 (1978)
MathSciNet Google Scholar
M. Scannapieco, I. Figotin, E. Bertino, A.K. Elmagarmid, Privacy preserving schema and data matching, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD ’07 (Association for Computing Machinery, New York, NY, USA, 2007), pp. 653–664. https://doi.org/10.1145/1247480.1247553
O.H.D. Sciences, Informatics: the Book of OHDSI. OHDSI (2019). https://ohdsi.github.io/TheBookOfOhdsi/
A.P. Sheth, J.A. Larson, Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22(3), 183–236 (1990)
Article Google Scholar
D. Stripelis, J.L. Ambite, Accelerating federated learning in heterogeneous data and computational environments (2020). ar**v:2008.11281
D. Stripelis, J.L. Ambite, P. Lam, P. Thompson, Scaling neuroscience research using federated learning, in IEEE International Symposium on Biomedical Imaging, Nice, France (2021)
Google Scholar
D. Stripelis, H. Saleem, T. Ghai, N. Dhinagar, U. Gupta, C. Anastasiou, G. Ver Steeg, S. Ravi, M. Naveed, P.M. Thompson et al., Secure neuroimaging analysis using federated learning with homomorphic encryption, in 17th International Symposium on Medical Information Processing and Analysis, vol. 12088 (SPIE, 2021), pp. 351–359
Google Scholar
D. Stripelis, P.M. Thompson, J.L. Ambite, Semi-synchronous federated learning for energy-efficient training and accelerated convergence in cross-silo settings. ACM Trans. Intell. Syst. Technol. (TIST) (2022)
Google Scholar
S. Van Buuren, K. Groothuis-Oudshoorn, mice: multivariate imputation by chained equations in r. J. Stat. Softw. 45(1), 1–67 (2011)
Google Scholar
J. Wang, Z. Charles, Z. Xu, G. Joshi, H.B. McMahan, M. Al-Shedivat, G. Andrew, S. Avestimehr, K. Daly, D. Data et al., A field guide to federated optimization (2021). ar**v:2107.06917
G. Wiederhold, Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38–49 (1992)
Article Google Scholar
Y. Wu, S. Cai, X. **ao, G. Chen, B.C. Ooi, Privacy preserving vertical federated learning for tree-based models (2020). ar**v:2008.06170
G. **ao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, M. Zakharyaschev, Ontology-based data access: a survey, in 27th International Joint Conference on Artificial Intelligence (IJCAI, 2018), pp. 5511–5519
Google Scholar
C. **e, S. Koyejo, I. Gupta, Asynchronous federated optimization (2019). ar**v:1903.03934
R. Xu, N. Baracaldo, Y. Zhou, A. Anwar, J. Joshi, H. Ludwig, Fedv: privacy-preserving federated learning over vertically partitioned data (2021)
Google Scholar
Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Article Google Scholar
S. Yang, B. Ren, X. Zhou, L. Liu, Parallel distributed logistic regression for vertical federated learning without third-party coordinator (2019). ar**v:1911.09824
X. Yin, Y. Zhu, J. Hu, A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 54(6), 1–36 (2021)
Article Google Scholar
J. Yoon, J. Jordon, M. Schaar, Gain: missing data imputation using generative adversarial nets, in International Conference on Machine Learning (2018), pp. 5689–5698
Google Scholar
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica et al., Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Google Scholar
C. Zhang, S. Li, J. **a, W. Wang, F. Yan, Y. Liu, Batchcrypt: efficient homomorphic encryption for cross-silo federated learning, in 2020\(\{\)USENIX\(\}\)Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 20) (2020), pp. 493–506
Google Scholar

Download references

Acknowledgements

This research was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract HR00112090104, and in part by the National Institutes of Health (NIH) under grant R01DA053028.

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, Los Angeles, CA, 90007, USA
Dimitris Stripelis & José Luis Ambite

Authors

Dimitris Stripelis
View author publications
You can also search for this author in PubMed Google Scholar
José Luis Ambite
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Stripelis .

Editor information

Editors and Affiliations

Center for Biomedical Informatics, Department of Pediatrics, College of Medicine, The University of Tennessee Health Science Center–Oak-Ridge National Lab (UTHSC-ORNL) Center for Biomedical Informatics, Memphis, TN, USA
Arash Shaban-Nejad
School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski
Altos Labs - Bay Area Institute of Science, Redwood City, CA, USA
Simone Bianco

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stripelis, D., Ambite, J.L. (2023). Federated Learning over Harmonized Data Silos. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds) Artificial Intelligence for Personalized Medicine. W3PHAI 2023. Studies in Computational Intelligence, vol 1106. Springer, Cham. https://doi.org/10.1007/978-3-031-36938-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-36938-4_3
Published: 02 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36937-7
Online ISBN: 978-3-031-36938-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Federated Learning over Harmonized Data Silos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Federated Learning

An Introduction to Federated and Transfer Learning

An Introduction to Federated Learning: Working, Types, Benefits and Limitations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Federated Learning over Harmonized Data Silos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Federated Learning

An Introduction to Federated and Transfer Learning

An Introduction to Federated Learning: Working, Types, Benefits and Limitations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation