Progressive Data Analysis: Roadmap and Research Agenda
Jean-Daniel Fekete, Danyel Fisher, and Michael Sedlmair, Eurographics Association, ISBN: 978-3-03868-270-7, DOI: 10.2312/pda.20242707, 2024
We live in an era where data is abundant and growing rapidly; databases storing big data sprawl past memory, reach computation limits, and become increasingly distributed. Engineers are designing new hardware and software systems, with new storage management and predictive computation, to sustain this growth. Yet, while good for data at scale, these infrastructures do not support exploratory data analysis (EDA) effectively. EDA allows analysts to make sense of data with little or no known model. It is essential in many application domains, from network security and fraud detection to epidemiology and preventive medicine. Data exploration is done through an iterative loop where analysts interact with data through computations that return results, usually shown with visualizations, which in turn are interacted with by the analyst again. EDA calls for highly responsive systems: at 500 ms, users change their querying behavior; past five or ten seconds, users abandon tasks or lose attention. To address this problem, a new computation paradigm has emerged in the last decade: <em>Progressive Data Analysis</em>.
This book is an introduction to this new paradigm. It explains the main scientific and technical benefits of performing complex data analysis progressively on large data. It also introduces the challenges that it raises to become fully usable. These important issues concern research fields that are traditionally separate in computer science: databases, scientific computing, machine learning, visualization, statistics, and human-computer interaction. The book ends with a research agenda to help the scientific community converge on key research questions.