Name: Data Management in Machine Learning Systems
ISBN: 978-3-031-01869-5

Overview

Authors:

Matthias Boehm ⁰,
Arun Kumar ¹,
Jun Yang ²

Matthias Boehm
1. Graz University of Technology, Austria
View author publications

You can also search for this author in PubMed Google Scholar
Arun Kumar
1. University of California, San Diego, USA
View author publications

You can also search for this author in PubMed Google Scholar
Jun Yang
1. Duke University, USA
View author publications

You can also search for this author in PubMed Google Scholar

Part of the book series: Synthesis Lectures on Data Management (SLDM)

1546 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99

Price excludes VAT (USA)

Softcover Book USD 54.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques.

In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators;data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers.

ADABench - Towards an Industry Standard Benchmark for Advanced Analytics

An Empirical Analysis Data Mining Frameworks—An Overview

How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science

Table of contents (9 chapters)

Front Matter

Pages i-xv

Download chapter PDF
Introduction
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 1-6
ML Through Database Queries and UDFs
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 7-19
Multi-Table ML and Deep Systems Integration
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 21-32
Rewrites and Optimization
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 33-52
Execution Strategies
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 53-71
Data Access Methods
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 73-83
Resource Heterogeneity and Elasticity
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 85-99
Systems for ML Lifecycle Tasks
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 101-121
Conclusions
- Matthias Boehm, Arun Kumar, Jun Yang
Pages 123-125
Back Matter

Pages 127-157

Download chapter PDF

Authors and Affiliations

Graz University of Technology, Austria

Matthias Boehm
University of California, San Diego, USA

Arun Kumar
Duke University, USA

Jun Yang

About the authors

Matthias Boehm is a professor at Graz University of Technology, Austria, where he holds a BMVIT-endowed chair for data management. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a focus on compilation and runtime techniques for declarative, large-scale machine learning. He received his Ph.D.from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, and a 2016 SIGMOD Research Highlight Award.Arun Kumar is an Assistant Professor at the University of California, San Diego. He received his Ph.D. from the University of Wisconsin-Madison in 2016. His research interests are in the intersection of data management, systems, and ML, with a focus on making ML-based data analytics easier,faster, cheaper, and more scalable. Ideas from his work have been adopted by many companies, including EMC, Oracle, Cloudera, Facebook, and Microsoft. He is a recipient of the Best Paper Award at SIGMOD 2014, the 2016 CS dissertation research award from UW-Madison, a 2016 Google Faculty Research Award, and a 2018 Hellman Fellowship.Jun Yang is a Professor of Computer Science at Duke University, where he has been teaching since receiving his Ph.D. from Stanford University in 2001. He is broadly interested in databases and data-intensive systems. He is a recipient of the NSF CAREER Award, IBM Faculty Award, HP Labs Innovation Research Award, and Google Faculty Research Award. He also received the David and Janet Vaughan Brooks Teaching Award at Duke. His current research interests lie in making data analysis easier and more scalable for scientists, statisticians, and journalists.

Bibliographic Information

Book Title: Data Management in Machine Learning Systems
Authors: Matthias Boehm, Arun Kumar, Jun Yang
Series Title: Synthesis Lectures on Data Management
DOI: https://doi.org/10.1007/978-3-031-01869-5
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 8
Copyright Information: Springer Nature Switzerland AG 2019
Softcover ISBN: 978-3-031-00741-5Published: 25 February 2019
eBook ISBN: 978-3-031-01869-5Published: 31 May 2022
Series ISSN: 2153-5418
Series E-ISSN: 2153-5426
Edition Number: 1
Number of Pages: XV, 157
Topics: Information Systems and Communication Service, Data Structures and Information Theory

Publish with us

Policies and ethics

Data Management in Machine Learning Systems

Overview

Access this book

Other ways to access

About this book

Similar content being viewed by others

ADABench - Towards an Industry Standard Benchmark for Advanced Analytics

An Empirical Analysis Data Mining Frameworks—An Overview

How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science

Table of contents (9 chapters)

Front Matter

Introduction

ML Through Database Queries and UDFs

Multi-Table ML and Deep Systems Integration

Rewrites and Optimization

Execution Strategies

Data Access Methods

Resource Heterogeneity and Elasticity

Systems for ML Lifecycle Tasks

Conclusions

Back Matter

Authors and Affiliations

Graz University of Technology, Austria

University of California, San Diego, USA

Duke University, USA

About the authors

Bibliographic Information

Publish with us

Navigation

Data Management in Machine Learning Systems

Overview

Access this book

Other ways to access

About this book

Similar content being viewed by others

Table of contents (9 chapters)

Front Matter

Back Matter

Authors and Affiliations

Graz University of Technology, Austria

University of California, San Diego, USA

Duke University, USA

About the authors

Bibliographic Information

Publish with us

Search

Navigation