"For those who believe, no proof is necessary. For those who do not believe, no proof is possible." ~Stuart Chase
Common words cause even more misunderstanding of science on the evidentiary side of the ledger than they do on the predictive side. The very meaning of evidence differs between science and law. In science any outward sign (observation or measurement) can be evidence. In law, evidence is something that furnishes proof.
Lawyers seek evidence in the form of testimony, material objects, documents, recorded images, or circumstances to establish facts. Evidence varies in quality from circumstantial to direct and is aimed to prove facts in one of two senses. In civil law, proof equates with the preponderance of evidence. In criminal law, proof equates with evidence beyond a reasonable doubt. Neither kind of legal proof is absolute or necessarily enduring. Legal proof of guilt or innocence can be overturned by later evidence. Facts in the eyes of the law are states of things, events or the existence of things. For example, the boulder in my backyard can be cold (a state), be the site of chipmunk burrowing (an event), or just be there (an “existence”). Facts are weighed by the quality and quantity of their evidence. A fact refuted by abundant, strong evidence loses its status as fact: The rock was warm rather than cold, the burrow was made by a gopher, not a chipmunk, or the boulder may have succumbed to gravity and left my steep backyard. A well supported fact, however, remains a fact and can contribute to proof as legally defined. Except in dystopia, facts each have only one true state whose accuracy of resolution can vary with the evidence. According to Black’s Law Dictionary online, facts have “an actual and absolute reality, as distinguished from mere supposition or opinion — a truth, as distinguished from fiction or error.” Note that these definitions were temporarily suspended in the United states from 2016 to 2020. Lawyers focus on facts and work toward proof of the facts that support their theories.
A slight tangent is the meaning of proof with respect to distilled spirits. It has origins in tax law. The 16th century criterion of whether distilled spirits had enough alcohol content to warrant taxation in the high British tax bracket in the 16th century relied on whether a pellet of gunpowder dropped into the burning liquid would itself burn. Burning was considered proof of sufficient alcohol content. The test was not particularly precise because it was sensitive to extraneous variables such as temperature and granulation of the powder mix. England moved on to use specific gravity, but it was also sensitive to temperature, and the English, French and U.S. definitions of proof in matters of distilled spirits soon diverged.
Scientists focus on data (including measurements and observations) and work toward strong tests of their theories, with the strongest tests often leading to rejection of hypotheses and narrowing of the applicability of theories from which those rejected hypotheses are derived. When a better theory is provided, data can force replacement of a theory whose scope has grown narrower than the new vision. I have never heard a scientist talk, however, about facts in a scientific context. Our focus instead is on the empirical evidence (data) in a context provided by the hypothesis.
Unsettlingly, “empirical” is used in two different contexts in science. Scientific evidence (observation or measurement) is empirical. The second context is as an empirical relationship — a statistical regularity noted in the data that was not predicted from theory. Cope’s rule (that body size increases in animals as they evolve) originated as an empirical relationship, although several theoretical mechanisms have been suggested to explain it since the relationship was first noted. Human height and arm span are roughly equal for most humans. This relationship is an empirical correlation; neither does arm span cause height nor does height cause arm span. The correlation was not found through a theoretical prediction. Empirical relationships provide predictions only in the limited sense that, for example, I can predict that the next people measured will have arm spans equal to their heights.
Scientists carry two definitions of “error.” One is that of a mistake, which scientists make as often as any other person. I can make this kind of error when I incorrectly read or transcribe an instrument reading as 88.8 volts when it was actually 33.3 volts. As another example, it is not hard to misidentify a species when there are many that are similar and small, and when few taxonomists have worked on the group. These kinds of errors in principle are avoidable. They are clearly undesirable albeit understandable within science and without.
The other kind of error involves the reliability and resolution of any measures or observations—their accuracy and precision. Biased measures are inaccurate and deliver values that are systematically too high or too low in the case of measurements that can be ordered or tend to favor one or more choices unfairly in the case of results that can be categorized but not located along a continuous scale. Repeated measures giving identical values are precise. Precision is most often quantified as the variability or confidence limits of repeated measures, high precision corresponding with low variability and narrow limits. A common such measure is standard error of the mean, which often is proportional to the statistical variance of repeated measures divided by the square root of the number of observations that were used to estimate the mean value in question. Bias has components in the first and second kinds of error, but some degree of imprecision is unavoidable. Imprecision is not a mistake in the sense of the first kind of error. There is no pejorative connotation associated with the error of the mean. High precision does not automatically provide comfort; it is entirely possible to be precisely wrong, to hit the nail precisely on the thumbnail. The second kind of error is useful information that should be quantified if at all possible and put to good use.
Precision plays large roles in scientific hypothesis testing. At issue is whether the results of experiments or observations have sufficient precision to resolve differences predicted by one or more competing hypotheses. Sample size often plays a key role in resolution, as indicated by the calculation of standard error. A frequent strategy of dishonest statistical consultants is to run a test of competing hypotheses, say on a control group of nonsmokers against a group of smokers and to assert a finding of “no difference.” Part of the dishonesty is to omit part of the description: “no statistically significant difference,” i.e., no difference that can be resolved. That finding can usually be assured by choosing a control group with high variability and using small sample sizes. In effect such consultants assert that absence of evidence is evidence of absence, which should be recognized as clear nonsense.
Two kinds of mistakes are serious impediments to scientific progress. One is the rejection of a true hypothesis (called beta error in statistical jargon), and the other is acceptance of a false hypothesis (alpha error). Neither is desirable, but the weight assigned to each varies with the application. Rejection of the true efficacy of a life-saving experimental drug has dire consequences. Most hypothesis tests do not have life-or-death consequences, and the most frequent approach to hypothesis testing aims to restrict—but cannot exclude—acceptance of a false hypothesis. Experimental and measurement designs are chosen to keep beta error at or below some threshold, most frequently 0.05 or one chance in 20. Scientists recognize that they may make alpha and beta errors, in part because of limited precision in their tests. When more evidence and more hypotheses point backward to suggest such errors, revisitation is the normal remedy.
Absolute proof exists in mathematics if and only if one is willing to accept the axioms from which the proof directly stems. Mathematical proofs are “if-then” statements. If the axioms are true, then the the rest is true by logic alone, without the need for empirical test. The “then” part of the statement is thus no more and no less than a restatement of the “if” part. Mathematical proofs follow this model, called a tautology. Proof of this sort in mathematics is possible because the universe of mathematics can be completely known, having been entirely created by mathematicians. Complete knowledge of natural processes is neither an achievable goal nor one attempted by science: When in a science paper I read that, “X is not completely understood,” I want to scream. Scientists do not prove hypotheses or theories except in the archaic sense of testing them. And if proof is impossible, then so must be disproof. Results of real tests are prone to false acceptance or false rejection of hypotheses. Repeated observations supporting hypotheses and the theory from which they are drawn eventually reduce interest in attempts to overthrow the theory, relegating it to accepted understanding. In no sense is it proven, as new results or new theory may expose the accepted theory to new scrutiny and overturn its broader applications just as relativity supplanted Newton’s laws of motion.
I cannot recall having a science-focused discussion about beliefs with another scientist or overhearing any such discussion between scientists. Belief is a word that scientists use outside their profession as much as others do outside theirs, but it is rarely used in science. I am not infrequently asked by non-scientist friends and acquaintances whether I believe in global warming, as if that were a scientific question. I believe based on well supported theory and manifold evidence that man-made global warming is ongoing, is already severe and has serious consequences from changing the crops that can be grown in a given region to changing the size of the culverts that are needed to control flooding of fields and roads. My answer as to whether I believe in global warming is that what I believe is irrelevant. What is relevant is the consistent theory and evidence about global warming. Global warming has been relegated to the status of accepted understanding by scientists even if not by non-scientists.
Svante Arrhenius in 1896, while developing theory to explain the ice ages, was early to suggest the involvement of carbon dioxide concentration in the atmosphere. The greenhouse mechanism whereby carbon dioxide causes global warming (by trapping heat produced from sunlight like glass and plastic greenhouses do) and how much carbon dioxide causes how much warming are much more accurately and precisely estimated now. Charles Keeling, a chemist at Scripps Institution of Oceanography, started a time series in 1956 of carbon dioxide concentration measurements in the atmosphere, wisely choosing as collection site a mountain top in the middle of the Pacific Ocean to approximate mean global, or at least mean northern hemisphere concentrations. It shows an inexorable rise in atmospheric carbon dioxide concentration that is well explained by the burning of fossil fuels. Its consequences for global mean temperatures as well as for proxies such as ice-in and ice-out dates for lakes all over the world are unmistakable evidence of a fossil-fuel-burning effect on top of natural cycles and variability. I know many scientists, and global warming is a frequent discussion topic among us. The data are clear, as is the mechanism. Again, the scientists I know treat global warming by fossil-fuel burning as accepted, unproblematic, scientific explanation.
The word belief rarely if ever arises in discussions about science among scientists or in their scientific writing. The reason that “belief” or “believe” is rare in scientific writing is that their dictionary definitions have wide ranges of denotations and connotations that include trust in some person or thing, or conviction of the truth of some statement or the reality of some phenomenon or being—but the word admits the full range of evidentiary support from none to iron clad. In contrast to popular science writing, where metaphor is more than welcome, the primary literature of science demands precision in word choice and description of the object of study, not of an object like the one under study (but different).
Asimov in his foundation trilogy introduced a regime in which theory was literally judged by the weight of the scientific writings in its favor, versus those against (hopefully on paper stock of the same weight per unit of area). That is not how evidence is evaluated by scientists. Some evidence is strong, and some, weak. It is hard if not impossible to refute seafloor spreading in light of the geologic formations and evolutionary lineages of plants and animals that match between continents that were once juxtaposed but are now widely separated by oceans. Voluminous arguments against continental drift have fallen.
Last but far from least are words like doubt, uncertainty and skepticism. In the vernacular, they each carry some negative connotation. Scientists, however, are comfortable with the states identified by all those three words. Richard Feynman described science as a process to avoid “fooling yourself.” Some of the best scientists I know perpetually ask themselves, “How else could these results be explained?”—rather than by the theory and hypothesis I used to develop the test. Self skepticism keeps you honest. Where the most doubt and uncertainty lies is where the scientific action is. If alternative interpretations are not possible, then experiments and observations have little value and attention elsewhere will be more profitable. William Byers in his book about how mathematicians think points out that the action (new math) in mathematics over its long history has always been where one finds, or better yet creates, some ambiguity in interpretation. In science the action is where two or more competing hypotheses explain existing data but new experiments and observations can be envisioned to distinguish between or among them. All hail doubt and uncertainty, not as goals but, as flocking seagulls lead fishermen to where the fish are, as symptoms of where scientific progress can be made. Statistics are servants of science quantifying uncertainty and allowing steady advance by rational choice among competing hypotheses.
Byers, W. 2007. How Mathematicians Think: Using Ambiguity, Contradiction, and Paradox to Create Mathematics, Princeton University Press, Princeton.
Jensen, W.B. 2004. The Origin of Alcohol Proof. Journal of Chemical Education 81: 1258.
Comments