Control is the hallmark of scientific experimentation. If an experiment is deemed to be lacking in control, it is unlikely to gain traction in the scientific community; arguably, an uncontrolled intervention is not even a genuine experiment. Today, scientific articles routinely mention controls and handbooks and instruction manuals on methods in the life sciences call for controlled experiments. Evaluating the appropriateness of controls is a core element of successful peer-review.

But despite its centrality to modern scientific inquiry, many foundational and historical questions about experimental control remain open. Experimental practice has been studied for decades, but only few analyses of scientific control practices in experimentation exist,Footnote 1 with almost nothing written on controlled experimentation in the longue durée.Footnote 2 We know little about changing expectations for well-controlled experiments or about different kinds of control, experimenters’ interpretations of control, or reasons given for applying controls. There is not even consensus about whether experimental control is an ancient, early modern, or Enlightenment concept, or whether it is a more recent feature of scientific inquiry.Footnote 3 This is, in part, because the concepts “control,” “control experiment,” and “controlled experiment” are polysemous, like “replication” or “significance.” In addition, methodological concepts for experimental practice have until recently received comparatively little scholarly attention.

“Control” has been studied mostly as a broader cultural phenomenon in the Western world. Cultural histories of control focus on ideologies and technologies for governing people, procedures, or systems of machines (Levin 2000b; Derksen 2017). Historical studies of control and science have shown how cultural currents, for better or worse, transformed scientific practices into more rigorous endeavors. Historians of science have noted the increasing importance in science of a quantifying spirit (Frängsmyr et al. 1990) and the values of precision (Wise 1995). They have examined the influence on science of tools such as statistics (Porter 1995; Gigerenzer et al. 1989) and surveillance devices (Foucault 1975, 1979), as well as bureaucratic procedures such as record-kee**, double bookkee**, and accounting. These authors have argued that institutional changes in science, such as the rise of the university and urban research laboratories, have helped to standardize scientific practice and make it more exact (Tuchman 1993; Dierig 2006). Eighteenth-century sciences of state promoted record-kee**, accounting, and statistical assessments of experimental data (Seppel and Tribe 2017). Nineteenth- and twentieth-century physics and engineering helped to create automated feedback control mechanisms (Bennett 1993), intertwined control and communication systems (Wiener 1948), and “networks of power” (Hughes 1983). They also brought about catastrophic failure of control, as in failed aerospace missions, plane crashes, and collapsing bridges (Schlager 1995). Industrial and technological advancements allowed researchers to engineer the development of living organisms and human heredity (Pauly 1987; Paul 1995), to standardize living things as model organisms for experiments (Rader 2004), and to measure human performance (Rabinbach 1990). The twentieth-century nexus of military, industry, and information technologies enabled wide-ranging control over data and information flow (Galison 2010; Franklin 2015).

Of course, broader socio-political and cultural developments such as industrialization, the institutionalization of university research laboratories, and the expansion of bureaucracies and state administration are impactful. These developments change how practices of research, recording, and record-kee** are organized, as so many authors have demonstrated. But they do not fully determine experimental designs or experimenters’ views on what is considered good and well-controlled or deficient and poorly controlled experimental practice.

This volume shifts the focus from broader socio-political and cultural contexts of control onto practitioners’ methodological strategies of inquiry and experimental design. While acknowledging that broader cultural forces do affect control practices, we contend that these forces only partially shape experimental design and strategy. We identify additional social dimensions of experimental control. On the one hand, identifying experimental conditions, confounders, and solutions to technical problems in experimental design takes time, and unfolds by the activities of multiple individuals or groups. On the other hand, whether an experiment counts as “sufficiently” or even “fully” controlled is not entirely decided by the experimenters themselves, nor can the question be settled by comparing actual experimentation with an abstract standard of the ideal controlled experiment.Footnote 4 The adequacy of control critically depends on the social interactions and negotiations among experimenters and their various interlocutors; as such, the issue is open to revisiting, revision, and renegotiation.

To capture the complicated and multilayered history of experimental control, it is useful to distinguish control strategies, control practices, and methodological ideas about experimental control. Control strategies are general designs and plans to follow in an experiment, like the comparison of an intervention target with a control. Control practices are the concrete actions by which experimenters implement control strategies in particular contexts. These contexts comprise all the resources available to the experimenters, including materials, tools, techniques, local expertise, and institutional opportunities. Methodological ideas are the broader notions of how to study nature and everything in it. They are contained in accounts of control strategies and practices, as the practitioners themselves give them.Footnote 5

Contributions to this volume deal with the details of experimental control practices, as well as with the expectations and perceived obstacles for experimental designs. The chapters are also sensitive to long-term developments of control strategies and methodological ideas. We provide a set of focused studies on control practices, strategies, and ideas that, together, cover a period of more than 300 years, with glimpses back to antiquity and forward to the late twentieth century. We contend that the long-term perspective is productive for understanding experimental methodologies and experimental control in particular.Footnote 6 The chapters offer several examples of how control practices using those strategies and ideas are shaped by local contexts—material-technical, conceptual, and social. Together, they illustrate that control strategies and methodological ideas often remain stable for a long time and change only gradually.

To study controlled experimentation from a historical perspective, we must distinguish at least two notions of control. The first is a broad sense of control as “managing,” “restraining,” or “kee** everything stable except the target system to be intervened upon.” This notion primarily but not exclusively concerns the experiment’s material side—the objects, the setting and environment, and the tools, as well as the guided manipulation or intentional intervention in an otherwise stable situation to see what will happen.Footnote 7

In an uncontrolled situation, experimenters cannot determine the changes resulting from their interventions. To extract information from unwieldy experimental situations, they must standardize instruments and experimental targets and hold fixed the experimental background conditions. They ought also to be free of preconceived opinions and other sources of influence. Experimenters seek to make the experimental setting and background as stable and rigorous as possible because effects, both expected and novel, appear most distinctly against a stable background.Footnote 8 Generally, then, we can consider any aspect of experimental practice from the perspective of control; a key question is how experimenters identify what must be controlled in concrete contexts and how they achieve that control.

There is also a narrower notion of control, referring to comparative experimental designs.Footnote 9 It primarily but not exclusively concerns the experiment’s epistemic side, or the conditions required for the experiment to generate knowledge. Modern scientists typically associate with “control experiment” a particular experimental strategy or design, namely the comparison to a control case. An experimental intervention is compared with a baseline; the target system of the intervention is compared with a similar target system that, unlike the experimental object, was not intervened on (the “control mouse,” say, which did not receive treatment). This strategy encapsulates the requirements for an experiment to be informative about cause-effect relations.Footnote 10

In the narrow sense, comparison to a baseline is needed to find out whether it really was the manipulation of this particular variable that made a difference to the experimental outcome.Footnote 11 Of course, the more similar the experimental situations are, the more informative the comparisons will be. Making informative comparisons thus requires control practices in the broader sense explained above, to ensure that the two experimental settings are stable, save the intervention.

We should avoid confusing the emergence of terms such as “control experiment” and “experimental control” in the scientific literature with the emergence of explicit discussions about control practices and strategies. The terms “control experiment,” “controlled experiment,” and “experimental control” are recent terms. Google Ngram shows a steep increase for “control experiment” in the last decades of the nineteenth century in English, French, and German-language scientific literature. Of course, Ngram is not a rigorous tracker for word usage, but based on its data, we can safely assume that control practices were common long before the term spread in scientific writing.Footnote 12 As our volume demonstrates, discussions about stable experiments antedate the appearance of the term “control” in this literature. Concerns about the adequate management of experimental settings were voiced as soon as experimentation became widespread. Robert Boyle, for one, published two famous essays on “unsucceeding” experiments, where he discussed the obstacles posed by impure chemicals, the variability of body parts in different corpses, and other issues threatening experimental success (Boyle 1999a, b).

The history of experimental control, then, encompasses four distinct yet related strands. The first is the historical development of control practices to stabilize and standardize experimental conditions. The second is the emergence and career of the comparative design in experimentation, understood as a way of generating and securing knowledge of cause-effect relations. The third involves the unfolding, both in philosophy of science and in the sciences themselves, of methodological discussions on control practices and designs in experimental practice. The fourth is the history of the term “(experimental) control.”

This volume concerns itself most with the first three strands. We do not systematically explore the history of the term “control;”Footnote 13 in fact, several contributions discuss research from before the late nineteenth century. However, precisely because control practices and strategies predate the term “control” in scientific literature, we keep terminological questions in mind as we analyze past experimental reports and methodological discussions. We pay careful attention to the terms past practitioners did use, whatever they were, to describe, explain, and defend control practices and strategies.

The contributions here examine how control practices and comparative designs developed, and include past accounts of critiques and defenses for these practices. Control is a multifaceted and elusive concept, and our volume reflects this. We have not attempted to reduce our discussion to a single definition of “control.” Although this introduction provides some points of orientation for analyzing control practices and strategies, each contributor further explains the concept for specific experimental contexts. The chapters range over different fields, from botany and vision studies, ecology and plant physiology, human physiology and psychology to animal behavior and experimental physics. They cover a period from the early seventeenth to the twentieth century. They examine experiments with complex and sometimes unwieldy objects and elusive phenomena. Chapters deal with studies on learning and judgment; color blindness in animals; auditory perceptions of tones, pitch, and vowel sounds; irregular movements; psychic forces; unobservable elements; and the best “photogenic climate” for promoting photosynthesis. Experiments on such objects and phenomena are hard to design, stabilize, and carry out, and they are often controversial. For this reason, they showcase questions and reflections on control in science particularly well.

The very practice of creating and maintaining a stable experimental situation is old, arguably as old as experimental intervention itself. Over time, experimenters learn what must be managed and tracked in experimental contexts; they seek to localize the phenomena of interest as well as the elements of the experimental setting in order to make interventions more exact. Gradually they develop new tools to do this. Precision instruments, elaborate recording devices, and other technologies available in the last century or two can assist with these tasks. The history of research laboratories can be written as the history of efforts to create highly controlled research environments. Nineteenth-century physicists worked at night or retreated to the lab basement to escape city noise, vibrations from trams, and exuberant students (Hoffmann 2001). Today’s scientists turn to specialized construction companies when they need “clean rooms” for research.Footnote 14 All-metal or all-plastic labs are built for research into the impacts of micro-plastics on materials and tissues or on radiation, respectively. Particle physicists dive to recover radiation-free lead from ancient shipwrecks to prevent contaminating their measurements.

Such materials and technologies often make it easier to keep an experimental situation stable and to track interesting changes.Footnote 15 At the same time, however, closer analysis of actual episodes shows that advancements in instrumentation, impressive as they may appear in hindsight, do not guarantee improved control. In fact, obtaining control often becomes more difficult, not least because researchers must learn the instruments’ proper functioning. “The more finely a method of investigation operates, the more complicated the devices used must be,” as Carl Stumpf noted (1926, 8).Footnote 16

Moreover, the history of control is a history of efforts—and efforts can fail. Implementing control strategies often fails, as even the experimenters themselves sometimes admit. Our volume illustrates how difficult it can be to manage an experimental setting, how resourceful some experimenters were in their management, and how they sometimes failed to achieve it despite intense effort. Claudia Cristalli’s researchers of psychic phenomena walk the line between controlling the psychic powers of the “percipients” in their experiments, and preventing them from sensing any phantasms at all. Christoph Hoffmann’s study of color blindness in fish shows how experimenters dealt with the tricky problem of controlling animals’ behavior. Experimenters found different solutions, both difficult to implement and neither completely satisfying. One option was to train the fish—much more challenging to do than training, say, a dog or rat. The other was to design the experimental setting in such a way that the “normal” behavior of the fish was taken into account when the behavior of interest was elicited. But what is the “normal” behavior of fish? And how can it be accommodated in the unnatural environment of a laboratory fish tank?

Other contributions illustrate how experimenters approached the creation and monitoring of an experimental setting. They discuss the multifaceted nature of the associated problems and the obstacles the experimenters had to overcome when attempting to stabilize unwieldy things, such as the irregular movements of microscopic parts, the germination, sprouting, and growth of plants, and auditory perceptions. The contributions describe the solutions they found to these problems. Experimenters tried their best to identify the smallest details of the experimental settings deemed relevant, and sometimes invented remarkably elaborate contraptions to keep them stable.

Caterina Schürch depicts the curious machines with which eighteenth-century plant physiologists tried to electrify plants and seeds with precise doses of electricity. Kärin Nickelsen shows how the nineteenth-century plant physiologist Julius Wiesner designed an artificial environment for his plants: double-walled glass jars, with the space between the walls filled with a solution of iodine in carbon disulphide. Because this liquid layer absorbed all visible light but heat rays, Wiesner could examine the impact of those rays on plant growth. Julia Kursell describes the giant arrangement of tubes Carl Stumpf erected to compare how his experimental subjects perceived natural and machine-generated vowels. She notes that, according to Stumpf, the increased finesse of experimental tasks required ever more complex experimental devices. Cristalli shows how Faraday, attempting to stop participants in table-turning experiments from making involuntary movements, designed a device consisting of a stack of cardboard sheets, arranged like a voltaic pile, with pellets of wax in between. The device would be placed between the hands of the séance participants and the tabletop. The sheets were arranged and marked in such a way that their displacement would indicate hand movements prior to the table’s movement.

These devices often astonish with their ingenuity, but the point is that they are the material realizations of what experimenters recognized as the relevant conditions and potential confounders for their experiments. They are therefore purpose-dependent, as Kursell notes; at the same time, they both constitute and constrain the generation of experimental knowledge. Cristalli’s, Schürch’s, Nickelsen’s and Evan Arnet’s chapters demonstrate this constraint: over time, views about what factors to manipulate, keep fixed, or monitor in controlled experiments might change considerably, even within a single research tradition. While Faraday built tools to control his subjects’ involuntary movements, his American colleague and erstwhile admirer Robert Hare turned to designing machines that would prevent voluntary movements in psychic experiments—in other words, to prevent fraud.

Schürch’s account illustrates a most dramatic change of focus. After decades of carefully controlled experimentation, which supported the view that electrification promotes plant growth, Jean Ingen-Houz showed, using the same control strategies, that it was not electricity but differences in light intensities that affected the plants. He thus re-oriented the entire research program of plant growth, rendering previously “well-controlled” experiments uncontrolled.

Similarly, in maze research on animal learning, later investigators critiqued their predecessors for stabilizing—“controlling for”—the very phenomenon they should have studied, as Arnet’s work illustrates. Nickelsen shows how control practices in photosynthesis research changed fundamentally as the experiments moved from the laboratory to the field. As she observes, the changes were not just practical—measuring natural light is harder than measuring laboratory light—but also conceptual. What mattered was no longer just “daylight,” but a complex set of factors consisting of the specific light individual plant parts received, intensity fluctuations during the day and the season, and so forth. Klodian Coko charts another kind of reorientation in his study of research on Brownian movement. Using the strategy of comparative experimentation, nineteenth-century researchers tried to establish what could and could not be the cause of Brownian movement. Later in the century, Brownian movement itself became evidence for a new kinematic-molecular theory of matter, which changed the understanding of rigor and experimentation.

Several chapters also direct attention to the fact that many experimenters were explicitly concerned with develo** co** strategies for “limited beings” (Wimsatt 2007) in sub-optimal situations. Researchers faced challenges not only because background factors were difficult or too numerous to monitor, but also because those factors were not immediately observable. Remarkably, the physicist Lord Rayleigh devoted several of his public-facing remarks to the theme of “deficient rigor.” As Vasiliki Christopoulou and Theodore Arabatzis point out, for Rayleigh, the pursuit of absolute (“mathematical”) rigor could even be detrimental to progress in physics. It was in this situation that experimenters insisted on using two or more different experimental techniques to check if both converged on the same outcomes, as detailed in the contributions by Christopoulou and Arabatzis and by Coko.

Notably, experimenters developed strategies to guard against entirely unknown influences on their experiments. The notion that natural phenomena in an experiment might occur and not occur in unforeseeable ways is centuries old. The metaphysical interpretation of this notion has changed dramatically over time (Hacking 1984, 1990), but there was wide and long-standing agreement about how to address it: namely, through multiple repetitions of experimental trials. Both the early seventeenth-century experimenter Scheiner and the late nineteenth-century experimenter Rayleigh gave the idea of multiple repetitions an important role in rigorous experimentation, if for different reasons.

In an early essay on medical experience, the ancient physician and anatomist Galen discussed the possibility that what is seen only once in a patient may not be a regular occurrence, and thus may not be worthy of acceptance and belief. Galen suggested this point in the middle of his attempt to demonstrate that medical practice is not just logos, but also experience.Footnote 17 As part of the argument, Galen alluded to the instability of memory and also noted that medicines work sometimes but not always (Galen 1944). In clinical medicine, at least, one single drug test might not produce reliable results, because “some things are frequent and some are rare” (Galen 1944, 113). It must therefore be repeated several times, and even then, it may not tell us what is usually the case.Footnote 18 Ibn Sīnā (Avicenna) expressed a similar idea in a proposal for rules of drug testing, albeit with a positive spin. He wrote that “the effect of the drug should be the same in all cases or, at least, in most. If that is not the case, the effect is then accidental, because things that occur naturally are always or mostly consistent” (Nasser et al. 2009, 80).

In the early modern period, we encounter this idea frequently, now also in discussions about experimentation beyond drug testing in clinical medicine. Repeating experimental trials several times, indeed “very many times,” became an imperative for rigorous experimentation—in this way, unknown or contingent and accidental influences on experiments could be avoided.Footnote 19 In later centuries it was to become a hallmark of rigorous experimentation that a trial be done more than once or on large samples.Footnote 20 However, as Schürch’s chapter shows, the appropriate number of repetitions remained contested.

Scholars looking for the “first” control experiment in the history of scientific inquiry typically assume, but in most cases tacitly, the narrower notion of “control” as comparative trial. They have found quite early examples for comparative designs in experimental practice. These examples often come from medicine, where it is both vitally and commercially important to discover the efficacy of certain drugs and treatments. The reputation of a practitioner depended on the treatments’ success.

For example, historian of statistics Stephen Stigler finds an instance of comparative experimentation in the Old Testament, in the Book of Daniel (around 164 BCE). Servants on a vegetarian diet are compared with children who eat “the king’s meat”: “And at the end of ten days their countenances appeared fairer and fatter in flesh than all the children which did eat the portion of the king’s meat” (Daniel 1:5–16).Footnote 21

A passage by Athenaeus (200 CE) describes how some convicted criminals had been thrown among asps and survived. It turned out that they had been given lemons prior to their punishment. The next day a piece of lemon was given to one convict but not to another. The one who ate the lemon survived the bites, the other died instantly.Footnote 22 The pseudo-Galenic treatise on theriac describes a trial with a similar design, whereby two birds would be poisoned and only one given an antidote (Leigh 2013). The trial tests the efficacy of medicines: if both animals survived, the tested antidote was recognized to be ineffectual. That experiment was again reported in the Middle Ages, notably by Bernard Gordon (McVaugh 2009).

Another famous ancient example is the legend of Pythagoras. As the story goes, he observed that most combinations of blacksmiths’ hammers generated a harmonious sound when striking anvils at the same time, while some did not. Pythagoras discovered that harmonious sounds were produced by those hammers whose masses were simple ratios of each other, while other hammers made dissonant noises when struck simultaneously. Notably, Ptolemy later criticized the Pythagorean experiment because, to him, it lacked control (Zhmud 2012, 307).

The Pythagorean case is interesting. It clearly has a comparative component, inspecting the sound of hammers whose masses were simple ratios of each other and that of other hammers. But in the historiography of science it does not serve as an example of an early “control experiment.” In fact, the ancient texts have too little information to determine whether it was consciously performed as an experiment compared with a control, whether Pythagoras simply varied the setup, or whether he arrived at his conclusions by observing different blacksmiths at work.

Conscious and explicit implementation of comparative designs appears to become more common in seventeenth- and eighteenth-century experimental practice. In his studies on the generation of insects, Francesco Redi famously compared samples of organic materials—“a snake, some fish, some eels of the Arno, and a slice of milk-fed veal in four large, wide-mouthed flasks” (Redi 1909, 33)—kept in open and closed containers. The samples were periodically inspected for traces of life. No life developed in closed containers, which Redi took as evidence against the spontaneous generation of maggots from putrefying flesh. Here, the comparative design demonstrates a cause-effect relation through the comparison with a “control.” Redi showed that maggots in open containers were generated by flies’ eggs.Footnote 23

The case of spontaneous generation research illustrates particularly well why it is useful to distinguish between comparative design strategies and a broader notion of control as management of the experimental setting. Redi’s experimental research was not decisive, and after him many other experimenters investigated spontaneous generation. They all contested each other’s experiments and many argued that their opponents had not properly maintained the experimental settings; they also argued that they themselves really had taken the necessary precautions to do so. John T. Needham, for instance, claimed that he could demonstrate the spontaneous generation of animalcules in infusions. He told his readers that he had “neglected no Precaution, even as far as to heat violently in hot Allies the Body of the Phial; that if any thing existed, even in that little Portion of Air which filled up the Neck, it might be destroy’d, and lose its productive Faculty” (Needham 1748, 638). Notably, he did not report a comparison with a vial that had not been heated in fire. It may have been superfluous to him, because it was obvious that animalcules would appear in it, as so often had been observed. The debates continued throughout the nineteenth century. Experimental designs and interpretations for possible contaminants varied, but the comparative strategy generally remained the strategy of choice.Footnote 24 As Schürch’s contribution shows, in the decades around 1800, experimenters across Western Europe advocated comparative experimental designs.

Reports of comparative trials can be found in many fields, from agriculture to clinical medicine.Footnote 25 A notable but little-studied example is stee** experiments (Pastorino 2022). A comparative experiment by Francis Bacon served as a template for many subsequent experiments on the effects of plant growth when stee** seeds in various fluids.

Our volume illustrates comparative trial designs in plant physiology, physics, animal behavior studies, and psychology. The episodes exemplify both the conscientious application of these strategies and the obstacles experimenters faced as they attempted to realize well-controlled comparative trials.

The earliest pre-modern reports of experimental trials and comparative designs contain little express discussion on control practices and strategies. There are exceptions, of course, especially in medical contexts. I already noted Galen’s writings, and we know that medieval scholars such as Ibn Sīnā developed rules for drug testing (Crombie 1952). Mostly, however, comparative designs were simply described and rarely justified; there was little explicit concern with managing the details of experimental settings. When ancient and medieval authors noted the drug test on two birds, they surely meant to show a test to support the drug’s efficacy, but the argument for the comparative approach often remained implicit. In modern scientific writing, by contrast, we sometimes find detailed discussions and justifications of experimental designs—in controversies about experimental results, in debates about the status of heterodox scientific fields such as research on psychic phenomena, and in situations of uncertainty.

In this volume, Tawrin Baker’s chapter on Scheiner and Christopoulou and Arabatzis’s chapter on Rayleigh epitomize both the scarcity and the abundance of practitioners’ discourse on their control practices and strategies. Scheiner demonstrated to his readers how experimentation could serve as a legitimate check on a theory of vision. He did not expound or defend methodological ideas in detail, although he did focus attention on the process of experimentation. Words and pictures conveyed the experimental setups. Scheiner instructed his readers to make certain experiences and experiments; he discussed the implications for the theory of vision. However, as Baker notes, several issues remained open, such as how often an experiment should be repeated or how one ought to deal with discrepancies. Christopoulou and Arabatzis’s chapter on Rayleigh shows that late-nineteenth-century scientists wrote not only about the details of their experiments but also about experimental control. Experimenters drew attention to how they had re-designed instruments to make their measurements more precise and how they had employed additional instruments to check the quality of their measurements. They often insisted on using two measurement methods to guard against error.

We still know little about the unfolding of methodological discussions in the centuries after Scheiner’s appeal to a variety of experiences and experiments and Boyle’s musings on unwieldy, “uncontrolled” experimental settings and about the practices appropriate for managing and extracting knowledge from these settings. Little is known about the emergence of explicit methodologies for comparative trials. According to some scholars, notably Edwin Boring, it was not until the mid-nineteenth century that we find such explicit methodologies. Boring associated the first methodology of comparative experimental designs with a philosophical text, John Stuart Mill’s System of Logic (Boring 1954). While the contributions to our volume do not tell a comprehensive history of methodological accounts on experimental control, they do suggest that it would be misleading to identify Mill as the sole originator and principal representative of these accounts.Footnote 26 As Schürch’s, Coko’s and Nickelsen’s chapters demonstrate, Mill was one of several early-nineteenth-century commentators on science who urged investigators to keep background conditions constant across trials, to “analyze” the background into different experimental conditions, and to compare the effects of interventions in one setting to another setting left untouched. But a broader history of these developments would still be desirable.

Our volume also shows that reflections about and justifications of control strategies predate modern philosophies of science. From Schürch’s study of late-eighteenth-century plant physiology we learn that, prior to Mill, practitioners not only called for rigorous and properly managed interventions, but also did much more: they reflected on control practices as validation procedures and debated their relative merits, practicality, and limitations. They observed that, to be instructive, comparisons must be made on sufficiently similar experimental subjects in similar situations. At times they disagreed about whether they or their colleagues had done enough to control their experiments. They criticized each other for not making comparative trials, for not controlling the right thing, or for not repeating a trial often enough.

The content of these debates and reflections tells us something about the experimenters’ own understanding of methodological issues concerning control, rigor, reliability, certainty, and failure in experimentation. Christopoulou and Arabatzis’s and Coko’s chapters illustrate this. As many contributors show, satisfactory control of an experiment is, in the end, an intersubjective, iterative achievement. Schürch and Christopoulou and Arabatzis note that experimenters such as Ingen-Housz and Rayleigh call upon others to check the results they themselves had obtained and to contribute additional experiments.Footnote 27 Cristalli charts the decades-long negotiations and re-negotiations among physicists, chemists, and psychologists on experimental practices deemed adequate to study psychic phenomena. The experimenters understood that their projects’ success depended on “controlling” their interlocutors as well.Footnote 28

This volume does not aim to replace earlier systematic discussions in history and philosophy of science on these issues, such as those on epistemological strategies of experimentation (Allan Franklin), tests for error (Deborah Mayo), representing and intervening (Ian Hacking), and how experiments end (Peter Galison). Our volume complements them. In fact, our discussions overlap with these approaches as we trace the history of controls while kee** epistemological strategies of experimentation in mind. We do contend that re-directing attention to control practices, control strategies, and practitioners’ accounts thereof illuminates new aspects of the history of experimental practices.

Control strategies and practices can be viewed as long-term and short-term methodological commitments, along the lines suggested by Peter Galison (1987). Arnet’s contribution to this volume uses this approach. Material and conceptual organizations of experiments vary, as do the identification of target systems, conditions, and confounders. The tools for stabilizing them change as well and are often (but by no means always!) local, context-specific, and relatively short-lived. Modern technologies allow for creative and sometimes intricate solutions to the problems of stabilization, standardization, and tracking. Yet the strategies have long been in place.

Control strategies are persistent. Even in the most complicated settings and with the most elusive phenomena, experimenters try to implement established control strategies as best they can, as shown in Schürch’s study of plant electrification, Coko’s discussion of experiments on Brownian movement, Cristalli’s study of psychic experiments, Kursell’s work on elusive auditory judgments, and Nickelsen’s discussion of plant physiology. Experimenters look for experimental conditions and confounding factors; they vary them to weigh their influence on experimental processes; they probe for error (Mayo 1996); they make their interventions less “fat-handed” (Woodward 2008); they compare situations meant to be similar and assess robustness, presupposing the no-miracle argument (Hacking 1985). At the same time, they develop specific, contextual implementations for these strategies, and they do not always agree on whether a particular implementation is effective.

In doing all this, experimenters face both technical and conceptual challenges. It may take a long time to harness experimental conditions, identify potential confounders, and find suitable techniques for doing so. Solutions to control problems will typically remain less than ideal. Hoffmann’s contribution demonstrates this fragility in control procedures. In debates about spontaneous generation, it took centuries to refine the tools to prevent contaminations from reaching the materials under investigation, and every new tool generated new issues for further exploration. Along the way, the understanding changed regarding the causes, conditions, and potential modifying factors and confounders. New technical challenges arose as a result.

Several chapters show that the implementation of control strategies may generate entirely new technical and conceptual problems for the experimenter, or even produce “surplus findings,” as Kursell writes.Footnote 29 Nickelsen, for instance, tracks changes in both the conceptualization and the logistics of managing background conditions for experiments on the influence of light on plant growth. Christopoulou and Arabatzis suggest that disturbances in physics experiments could become research topics in their own right. Arnet’s work also brings into relief the problematic implications of an over-emphasis on rigor and control. Early mazes were designed as simple systems of tracks in order to minimize environmental cues. But for a more complete understanding of animal learning, later researchers re-introduced precisely those same environmental features. The early mazes embodied a regime of control that stripped animals of certain sensory and environmental cues. Those mazes, however, excluded exactly those features that later researchers thought essential to advanced rodent learning.Footnote 30

Finally, several chapters suggest that it is fruitful to think of experiments as “controls of inferences,” because this perspective also brings out relevant methodological issues and their historical development. As Baker demonstrates, for early modern experimenters coming to grips with their Aristotelian heritage, the role of experiments in scientific inquiry was a crucial issue. In hindsight, studying how they managed this issue can also tell us something about Aristotle’s own ideas on the role of experimentation in empirical inquiry. For eighteenth- and nineteenth-century inquirers, then, the question is not so much whether but how, exactly, experimentation and experimentally generated knowledge can help us to understand nature. Steinle, Coko, Nickelsen, Kursell, and Hoffmann show how intricate the question can be as experiments target unobservable phenomena. As these experiments involve increasingly complicated instruments, hypotheses, assumptions, chains of inferences, and interpretations, the challenges for experimenters increase accordingly.

We place practitioners’ methodologies, experimental designs, strategies of inquiry, and practices of implementation in the center of our analyses. We thereby draw new trajectories and connections in the history of experimental inquiry. We identify lines of experimentation that sometimes turned into models of rigorous experimental design while other times being criticized. Bacon’s stee** experiments with plant seeds, as analyzed by Pastorino, exemplify a specific kind of comparative experimentation. It would be applied again and again throughout the eighteenth century, not just in plant science but also in other scientific fields. Pythagoras’ hammer experiments too were repeated, at least repeatedly reported, by several scholars prior to Galileo and Mersenne. In this case, the design was not a model but a point of critique for later scholars.

Our studies on control practices and on their discussion and justification have revealed other lineages and cross-fertilizations—among physics and psychology, physiology, botany and ethology, chemistry, medicine, agriculture, and philosophy. Control practices and strategies are contextual, in that the context determines what is controlled and how to achieve control. But control strategies and at times even control practices are not discipline-specific. The same strategies travel across disciplines, from physics to medicine and physiology to chemistry and back again. Several chapters suggest that the same methodological ideas and control strategies are advocated across national boundaries (see especially Schürch and Coko). Control strategies such as comparative designs and multiple repetitions are relatively stable across historical periods. But they may be justified in different ways at different times and may cease to be justified at all.

With our work, we hope to stimulate broader discussions about the longer-term history of rigorous experimentation: what are the strategies involved in it? And how do debates concerning well-designed experiments unfold in different fields and periods? By our effort we seek to clarify the roles of experimental strategies and methodologies as driving forces for scientific change, and as tools for determining what it means to do—or not to do—good science.

***

This volume (and its companion, a collection of essays on analysis and synthesis) originated in a Sawyer Seminar at Indiana University Bloomington titled “Rigor: Control, Analysis and Synthesis in Historical and Systematic Perspectives,” which was funded by the Andrew W. Mellon Foundation. Mellon Sawyer Seminars are temporary research centers, gathering together faculty, postdoctoral fellows, and graduate students for in-depth study of a scholarly subject in reading groups, seminars, and workshops. As part of our activities, we organized two international conferences. They brought together scholars in history, philosophy, and social studies of science who examine historical and contemporary dimensions of rigor in experimental practice. The contributors to this volume participated in the second of the Sawyer conferences (March 2022) and reconvened a few months later for an authors’ workshop, at which the draft chapters for this volume were intensely discussed.

Several institutions and individuals helped to make our work possible. We gratefully acknowledge the Mellon Foundation’s generous financial support, and especially the Foundation’s flexibility as we dealt with the challenges of pursuing collaborative scholarship during a pandemic. We are grateful to Director of Foundation Relations Cory Rutz at Indiana University’s Office of the Vice President for Research, for his prompt and efficient assistance in administering the grant. The authors’ workshop took place at the IU Europe Gateway (Berlin) and was funded by a combined grant from the IU College of Arts and Sciences and the College Arts and Humanities Institute. We very much appreciate this support. We are indebted to Jed Buchwald for including our work in the Archimedes series, and to Chris Wilby for his efforts in moving the publication along. A big thank you to our department manager Dana Berg (Department of History and Philosophy of Science and Medicine at IU), office assistant Maggie Herms (IU HPSC), and Andrea Adam Moore (IU Europe Gateway), all of whom helped to organize our conferences and workshops. Finally, we warmly thank the many participants at the two conferences and at the various other Sawyer events for their valuable input, comments, questions, and critique.