Progressive Data Analysis: Roadmap and Research Agenda

Jean-Daniel Fekete, Danyel Fisher, and Michael Sedlmair, Progressive Data Analysis: Roadmap and Research Agenda, Eurographics Association, ISBN: 978-3-03868-270-7, DOI: 10.2312/pda.20242707, Nov. 2024

Publisher's Page
PDF Version (free)

Table of Contents

  • Introduction
  • Concepts and Definitions
  • Data Management
  • Data Structures and Algorithms
  • Visualization
  • Uncertainty and Quality
  • Human Aspects
  • Machine Learning
  • Evaluation
  • Challenges and Research Agenda
  • Conclusion


Abstract

We live in an era where data is abundant and growing rapidly; databases storing big data sprawl past memory, reach computation limits, and become increasingly distributed. Engineers are designing new hardware and software systems, with new storage management and predictive computation, to sustain this growth. Yet, while good for data at scale, these infrastructures do not support exploratory data analysis (EDA) effectively. EDA allows analysts to make sense of data with little or no known model. It is essential in many application domains, from network security and fraud detection to epidemiology and preventive medicine. Data exploration is done through an iterative loop where analysts interact with data through computations that return results, usually shown with visualizations, which in turn are interacted with by the analyst again. EDA calls for highly responsive systems: at 500 ms, users change their querying behavior; past five or ten seconds, users abandon tasks or lose attention. To address this problem, a new computation paradigm has emerged in the last decade: <em>Progressive Data Analysis</em>.

This book is an introduction to this new paradigm. It explains the main scientific and technical benefits of performing complex data analysis progressively on large data. It also introduces the challenges that it raises to become fully usable. These important issues concern research fields that are traditionally separate in computer science: databases, scientific computing, machine learning, visualization, statistics, and human-computer interaction. The book ends with a research agenda to help the scientific community converge on key research questions.


Structure of the Book

Authors

Marco AngeliniSapienze Univesità di Roma
Michaël AupetitQatar Computing Research Institute
Sriram Karthik BadamApple Inc.
Carsten BinnigTechnical University of Darmstadt & DFKI
Jean-Daniel FeketeInria & Université Paris-Saclay
Danyel Fisher 
Barbara HammerCITEC, Bielefeld University
Jaemin JoSungkyunkwan University
Nicola PezzottiPhilips Cardiologs, TU/e, AI4MR
Gaëlle RicherInria & Université Paris-Saclay
Florin RusuUniversity of California Merced
Giuseppe SantucciSapienze Univesità di Roma
Hans-Jörg SchulzAarhus University
Michael SedlmairUniversity of Stuttgart
Hendrik StrobeltIBM Research, MIT-IBM Watson AI Lab
Cagatay TurkayUniversity of Warwick
Anna VilanovaTU/e
Chris WeaverUniversity of Oklahoma

Bibtex

@book{PDABook,
  TITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  AUTHOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {231},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@string{FORPDA = { for Progressive Data Analysis}}

@incollection{PDABook:2,
  TITLE = {Concepts and Definitions} # FORPDA,
  AUTHOR = {Giuseppe Santucci and Michaël Aupetit},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {16--32},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:3,
  TITLE = {Data Management} # FORPDA,
  AUTHOR = {Florin Rusu and Carsten Binnig and Chris Weaver},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {33--48},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:4,
  TITLE = {Data Structures and Algorithms} # FORPDA,
  AUTHOR = {Jean-Daniel Fekete},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {48-68},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:5,
  TITLE = {Visualization} # FORPDA,
  AUTHOR = {Marco Angelini and Jaemin Jo},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {70-91},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:6,
  TITLE = {Uncertainty and Quality} # FORPDA,
  AUTHOR = {Anna Vilanova and Marco Angelini and Sriram Karthik Badam and Jean-Daniel Fekete},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {92-107},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:7,
  TITLE = {Human Aspects} # FORPDA,
  AUTHOR = {Hans-J\"org Schulz and Micha\"el Aupetit and Danyel Fisher},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {108-131},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:8,
  TITLE = {Progressive Methods in Machine Learning},
  AUTHOR = {Cagatay Turkay and Nicola Pezzotti and Hendrik Strobelt and Barbara Hammer},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {132-148},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}

@incollection{PDABook:9,
  TITLE = {Evaluation} # FORPDA,
  AUTHOR = {Ga\"elle Richer and Jean-Daniel Fekete and Michael Sedlmair},
  BOOKTITLE = {Progressive Data Analysis: Roadmap and Research Agenda},
  EDITOR = {Fekete, Jean-Daniel and Fisher, Danyel and Sedlmair, Michael},
  PUBLISHER = {Eurographics},
  PAGES = {149-170},
  YEAR = {2024},
  MONTH = Nov,
  DOI = {10.2312/pda.20242707},
  ISBN = {978-3-03868-270-7},
  PDF = {https://diglib.eg.org/bitstreams/f57f92d0-df37-4569-9ca1-c15980c541a2/download},
  URL = {https://diglib.eg.org/handle/10.2312/3607057}
}