The rise of AI has made manifest the power of data in the modern world. Although versions of artificial intelligence (AI) have long been part of the technological landscape, the recent emergence of ChatGPT was a culturally transcendent moment on a global scale. For the first time in history an average person could interact with an AI agent, which resulted in collective amazement and alarm around the world. Data, although long known to be a powerful tool for understanding social, biological and/or physics/engineering systems, had now powered an algorithm capable of mimicking human intelligence and interaction. The data behind the model have empowered this AI to reach an unprecedented milestone in its capabilities and performance.
How Data Happened by Wiggins and Jones is centered on the history of data itself, which has often been overlooked in favor of algorithms and computational power. The act of collecting, analyzing, and making predictions with data has a long history which is exceptionally well detailed in the book. It is a comprehensive and masterful telling of how early and principled efforts in data collection led to transformations in the fields in which it was collected. Wiggins and Jones also highlight how data has historically been used by statistically-minded people to not only significantly advance a field and aid humanity, but to also advocate, advance, and manipulate political positions, social narratives and prejudicial ideologies. Starting as early as the 18th century, when statistics was first introduced as its own field of study, up to the modern day, the authors detail many of the leading figures who helped shape what now constitutes modern data science. Wiggins and Jones demonstrate the progression of critical ideas around data built from one generation to the next, showing that modern data science, and our modern AI, resulted from a long and continuous set of mathematical developments which were realized and empowered by algorithms acting on data.
From Aristotle to Newton to the James Webb telescope, observational data has always been used by scientists and engineers to develop and apply theories of the natural world. However, Wiggins and Jones begin by discussing the systematic collection of data for understanding social systems and people. Unlike the natural world, where carefully controlled experiments could be replicated and reproduced, those studying social systems are not typically afforded such opportunities. Instead, data collection offered a new way of modeling systems through consideration of averages and variability. The field of statistics naturally arose from such considerations and fundamentally frames its results in terms of such concepts. The ideas of correlations and causality were first considered in social systems within this emerging mathematical framework1.
Many early statistical methods have significantly improved over subsequent generations. However the early versions of such methods, and especially their erroneous interpretations, were foundational in nefarious efforts by some to justify the superiority of groups of people based upon race, gender and nationality2. The early chapters in Part One of How Data Happened are detail how data can be used for advancing an agenda, with data being capable of politicization with devastating effects3. COVID19 is a salient example of data being wielded in a politically manipulative manner by those advocating for a range of social policies. Wiggins and Jones show that leveraging social data to push an agenda began the moment we started measuring social systems by collecting data. How Data Happened is an enlightening account that can help us understand how seemingly politically neural mathematical methods can empower modern day controversies in pandemics and climate change, for instance.
Part Two of How Data Happened focuses on the historical developments beginning in the mid-20th century, and the legacy of algorithms and computing that arose from wartime efforts. And it was shortly after this postwar effort that Alan Turing in 1950 posed one of the most influential questions in the history of computing: “Can a machine think?”4. The first computers became available to a select group of scientists and engineers who could process data at scales previously unimaginable. This paved the way for the first waves of AI via expert systems. By the 1970s, the wider availability of computers led to the first call for data science as its own discipline by John Tukey of Princeton University 5.
Data and computing began to merge and transform the possibilities of what could be achieved in many areas of application in science, engineering, and social systems. The field of computer science was born around this time as well, and the first departments started to appear at universities around the world. Wiggins and Jones detail the development of machine learning and AI algorithms, and how these algorithms are used for extracting patterns in data at scale. Computers and optimization algorithms are now at the focal point of helping machines learn and “think” as Turing would suggest they are capable of. Wiggins and Jones focus on the data that enabled this, which contrasts with the greater focus given to the algorithms and AI applications empowered by the data. How Data Happened is a refreshing perspective that highlights the foundational aspects of how data, in and of itself, drives technological advances.
Part Three of the How Data Happened reaches the modern era, where dominant technology companies have leveraged data into trillion-dollar sectors. The growth exhibited by these sectors would be infeasible without the personal data that users voluntarily surrendered. The collection of data at global scales has led to a new paradigm in the understanding, characterization, and manipulation of social systems. With Parts One and Two detailing how data was used to manipulate political positions, social narratives, and prejudicial ideologies, Wiggins and Jones bring into sharp focus the tremendous need for ethical considerations in data collection, management, and usage. The reader may surmise from this book that history repeatedly shows how bad actors with an agenda can use data to manipulate a narrative. Such bad actors can now skew facts and figures with global repercussions. The work highlights the ongoing tension between companies, governments, and individuals around data, which is the foundational resource for all AI algorithms. Wiggins and Jones consider the use of data through a historical lens and provide important examples of how data has been used in the past and present for both good and bad.
In my view, How Data Happened is an important contribution and I highly recommend it. The work is not about ethics, rather it clearly tells a detailed history of data itself. With such a history, the reader is empowered to be more objective and educated about the uses and prospects of data in the modern world. The reader is armed with ample examples of the effects of greater data availability for social systems. Ethical considerations are often better framed around historical evidence than through an abstraction.
How Data Happened is also a timely read, given how ChatGPT has re-ignited the conversation of the dangers of AI globally. The book highlights that perhaps the true danger of AI lies not in algorithms, but in the those who may choose to manipulate algorithms to serve an agenda.
Despite the many wonderful contributions to society that have emerged from data science, Wiggins and Jones show one example after another of how those in power have been behind the nefarious uses of data. In Part Three, the authors seem to suggest that accountability, in the form of data ethics and governmental guidelines, is critical for us to consider. AI has emerged as one of the most remarkable human tools ever developed. Ensuring that such tools remain safe is critical: the goal of helping humanity writ large in medicine, biology, science, and engineering should remain the central aim of data science. Indeed, AI is critical for the next generation of technologies that are now emerging across every discipline. However, the efforts towards organizing accountability and ethical standards across disciplines appear delayed, which opens the door for influential narratives to take root if we are not vigilant. How Data Happened offers an exceptional opportunity to learn from the history of data so that we might avoid mistakes committed by our predecessors. For that fact alone, this book is worth reading.