Bayes' Theorem: Adjusting Investment Decision Weights Using Bayes' Theorem

Qilehui · Jun 3 23:35

Source: Qile Club

Written by British science writer Tom Chivers, *Bayes' Theorem* reveals the essence of Bayes' theorem: it is not only a major innovation in probability theory but also a scientific tool that helps us think clearly and make decisions under uncertainty.

Bayes' theorem enables us to continuously update our understanding, dynamically adjust decisions, quantify uncertainty, overcome reliance on anecdotal experience, distinguish between what can and cannot be changed, maintain internal stability, and accept inherent uncertainty.

The original title of *Bayes' Theorem* translates literally as 'Everything Is Predictable,' yet all predictions share a common problem—the uncertainty of outcomes. Bayes’ theorem, in fact, is a concise equation that estimates the probability of an event based on available information. Specifically, it is a form of conditional probability. In the formula, the vertical bar '|' is shorthand for 'given that' or 'under the condition that,' and P(A|B) denotes 'the probability of event A occurring given that event B has already occurred.'

Bayes' theorem was proposed by Thomas Bayes, an 18th-century British Presbyterian minister. During his lifetime, he wrote a theological treatise and a book analyzing Newton’s calculus. After his death, his friend Richard Price discovered some of his manuscripts and unfinished notes. Price edited and published them in the *Philosophical Transactions of the Royal Society*, which contained what is now known as Bayes' theorem.

Bayes' theorem expresses a principle of probability theory. It not only helps us identify fallacies in reasoning but also reveals something deeper. Often, the crux lies in the 'inverse'—namely, how likely a coincidence truly is. This is essentially an 'inverse probability' problem: probability theory concerns what might happen under given conditions, rather than what has already occurred.

Bayes' theorem represents ideal decision-making: the extent to which a decision adheres to Bayes’ theorem determines how correct that decision is. Humans appear to function as Bayesian machines—our brains and perception seem to operate through a process of 'predicting the world → forming prior probabilities → acquiring new data through senses → updating predictions.' Our conscious experience of the world appears to be shaped by optimal prior probabilities.

Rare events do indeed occur.

Modern probability theory originated in the 17th century from correspondence between Blaise Pascal and Pierre de Fermat. They discussed the 'problem of points'—how to fairly divide stakes when a game of chance is interrupted. By calculating each player’s probability of winning, they established the concept of expected value, laying the foundation for the development of probability theory. Later, Swiss mathematician Jacob Bernoulli propelled probability theory into a new phase.

Bernoulli mathematically demonstrated that the more times a coin is tossed, the closer its observed frequency approaches the 'true' probability. Inferring the probability of individual events based on knowledge of the whole constitutes 'probabilistic inference,' whereas estimating characteristics of the whole based on sample survey results constitutes 'statistical inference.'

Bernoulli realized that we can never be absolutely certain our conclusions reflect the true answer; we can only approach it as closely as possible—with different conclusions carrying varying degrees of confidence. Probability exists not only in games and gambling; humans constantly engage with probability in everyday life—for instance, assessing the likelihood of someone being a murderer in a homicide case or evaluating whether a document has been forged.

Although we can never know anything with absolute certainty, we do recognize that different events carry different probabilities. However, Aubrey Clayton argues in *Bernoulli’s Fallacy* that Bernoulli discussed 'sampling probability' rather than 'inferential probability' and failed to distinguish between the two.

Abraham de Moivre was a French Protestant who successfully advanced Bernoulli’s theory one step further. However, his focus was not on the magnitude of numbers but on the shape of the curve. The larger the number N of coin tosses, the clearer the curve becomes. Rather than laboriously calculating, using formulas, the probability of obtaining six heads in 100 coin tosses, it is more efficient to analyze the mathematical expression of the curve and then use that expression to compute the probability of a given outcome. This curve is the famous “normal distribution curve,” also known as the “bell curve.” Bernoulli had only discovered that larger sample sizes yield more accurate results; de Moivre went further by quantifying the theory.

A contemporary of Bayes, Thomas Simpson published a paper in 1755 analyzing errors in astronomical observations. He argued that we should use the mean of all observed values rather than the so-called “Aristotelian mean”—the sum of the maximum and minimum values divided by two. Using a special case of the law of large numbers, he successfully proved his point. Simpson was concerned with “how to calculate the probability that a hypothesis is true given observed outcomes,” rather than “how to calculate the probability of a specific outcome given a pre-established hypothesis.” This approach is known as “analytical inference.” This groundbreaking attempt finally elevated statistics beyond a mere “mathematical game (or gambling strategy) for seasoned casino players” and transformed it into a broadly applicable tool for reasoning.

Bayes began contemplating statistical inference—or what was then called “inverse probability”—while reviewing Simpson’s paper. In Bayes’s view, probability was merely “a way of describing the unknowable aspect of the world.” In other words, probability is subjective—a representation of people’s best guesses about the unknown and about truth. If one million red balls have all landed on the left side, according to Bayesian theory, the next red ball could still appear on the right side, with a probability of 1/1,000,002. Every new piece of information brings us closer to “absolute certainty,” yet we can never truly attain “100% certainty.”

After Bayes’s death, Richard Price, who compiled and edited his papers, became arguably the world’s “first Bayesian.” He sought to rescue God from the arguments of David Hume. Hume contended that no amount of evidence could justify belief in miracles—events that violate natural laws. Hume’s argument itself was probabilistic: throughout our lives, we almost never witness violations of natural laws, yet we encounter countless lies.

However, Price believed that rare events do indeed occur. Even if you have observed the sun rising one million times and witnessed tides ebbing and flowing one million times, you still cannot be absolutely certain these phenomena will continue indefinitely. Similarly, even if we have never seen a dead person resurrect in our entire lives, we cannot be 100% certain it will never happen.

By synthesizing the findings of Bernoulli, de Moivre, and Simpson, one can conclude that, under conditions where measurement errors are purely random and contain no systematic bias, repeated observations of an event will tend to cluster around the true value. The true value—the actual, objective value of a measured quantity under specified conditions—is approached asymptotically through extensive observation. Bayes further demonstrated that if we form a prior estimate of this true value and assign it a prior probability reflecting its plausibility, we can combine this prior with observational data to make inferences—constructing a reasonable hypothesis about what has occurred.

The individual who truly applied probability theory and statistics to the social sciences was the Belgian mathematician Adolphe Quetelet. His principal contribution to statistics was the concept of the “average man.” He established distinct numerical scales for various human characteristics and found that data such as height, weight, strength, and even tendencies toward behaviors like suicide generally follow a normal distribution. These traits result from the cumulative effect of numerous minor influences.

These influences typically do not all push in the same direction—some are positive, others negative—resulting in characteristics like height, weight, and alcohol consumption clustering around the population mean and forming a normal distribution. However, Quetelet failed to recognize that many datasets do not conform to a normal distribution and erroneously assumed that all data could be fit within this framework.

Bayes’ Theorem: From Outcomes to Hypotheses

Although Bayesians have made significant contributions to the development of probability and statistics, practicing statisticians and scientists rarely use Bayes’ theorem in their daily work, as most belong to the so-called frequentist school. The frequentist approach does precisely the opposite of the Bayesian method.

Bayes' theorem enables us to move from observed outcomes to hypotheses—specifically, how to calculate the probability that a given hypothesis is true based on observed results. In contrast, frequentists move from hypotheses to outcomes—calculating the probability of observing a particular result given a pre-established hypothesis.

The 'prior probability' in Bayesian terms refers to probabilities derived from past experience and analysis. This is fundamentally a philosophical issue: our judgments are inherently subjective. 'Priors' do not describe the world itself, but rather our own state of knowledge and ignorance.

Bayesian theory seems to suggest that whether something is true or false depends on how strongly we initially believed in it. Thus, probability ultimately becomes subjective and personal, rather than real and objective. The rise of the frequentist school appears to stem precisely from an aversion to such subjectivity.

Two prominent statisticians emerged from the frequentist school: Karl Pearson and Ronald Fisher. Pearson developed the chi-****d test, which helps mathematicians determine whether a data sample follows a normal distribution or some other distribution.

Moreover, he coined the term 'standard deviation.' Fisher was a leading figure in 20th-century statistics. He created and refined numerous statistical tools, many of which remain in use today. He developed various mathematical models for analysis of variance (ANOVA), introduced the concept of 'statistical significance,' and invented the 'method of maximum likelihood' to help determine which hypothesized data distribution best explains observed research data.

'Likelihood' is a term coined by Fisher. Maximum likelihood estimation involves using known experimental data to assess which hypothesis is most likely to have produced the observed results. For example, the hypothesis that 'the coin has been tampered with and lands heads 80% of the time' is more likely to yield the outcome '8 heads in 10 tosses' than the hypothesis that 'the coin is fair.' The likelihood ratio between these two hypotheses is approximately 7. Maximum likelihood estimation merely helps compare which hypothesis is more likely to generate the observed experimental outcome; it does not tell us which hypothesis is more likely to be true.

However, Pearson pointed out that maximum likelihood estimation itself falls within the Bayesian framework—it assumes equal prior probabilities for all hypotheses—and then provided a proof demonstrating that, under this assumption, maximum likelihood estimation is flawed. Nevertheless, Fisher deeply disliked Bayesianism, and Pearson also rejected associating probability with subjectivity. The core disagreement centered on the proposition: 'If we do not know which outcome is most likely, we should assume all outcomes are equally probable.'

John Stuart Mill once criticized Bayes' theorem. In 1843, he wrote: 'Knowing only that one of two events must occur, yet being unable to determine which one, is insufficient grounds to claim both events are equally probable. We must rely on empirical evidence to demonstrate that the two events occur with equal frequency.' Mill considered the notion that 'probability is merely a description of our own ignorance' to be utterly foolish.

In his view, probability reflects the true state of the world—namely, the frequency with which events occur. Empirical evidence shows that if a coin is tossed sufficiently many times, the number of heads and tails will be approximately equal; and the more times the coin is tossed, the closer the counts of heads and tails will approach equality.

If one were to summarize the disagreement between Bayesians and frequentists in a single sentence, it would be this: Bayesians regard probability as subjective—a measure of human uncertainty about the world—whereas frequentists view probability as objective—a description of how often a particular outcome occurs over many repeated trials.

For a considerable period, the Bayesian school remained at a disadvantage, while the frequentist school—represented by Ronald Fisher and Karl Pearson—gradually became the standard doctrine among scientists and statisticians. Although interest in Bayesian theory dwindled, it did not disappear entirely. On certain problems, it remained the only viable statistical approach—a point even Fisher himself acknowledged.

Harold Jeffreys, a geophysicist at the University of Cambridge, was a pivotal figure in early Bayesian scientific thought. He famously stated, 'Bayes’ theorem is to probability theory what the Pythagorean theorem is to geometry.' In 1926, Jeffreys discovered that the Earth’s core is liquid—the upper mantle primarily consists of silicate-based rocks, whereas the core is composed mainly of iron and nickel.

He attempted to use the arrival times of seismic waves recorded at various seismographic stations to determine both the earthquake’s epicenter and the properties of the materials through which the waves traveled. However, earthquakes are relatively rare events, and even when data were obtained, they were often contaminated with substantial noise, rendering the entire process highly uncertain. Consequently, only preliminary conclusions could be drawn initially, which were then incrementally updated and refined as new information became available. This iterative process relied not on 'uncertainty' per se, but rather on 'degrees of belief'—in other words, it was inherently Bayesian.

Each time new information became available, Jeffreys updated his prior degree of belief in a given hypothesis: 'Every scientific advance begins with complete ignorance; as evidence accumulates, an increasingly compelling hypothesis emerges until its degree of belief reaches an acceptable level. It is precisely this element of scientific uncertainty that makes science most fascinating.' He maintained that uncertainty permeates all phenomena—even scientific laws are not exempt—and that all forms of uncertainty can be expressed probabilistically. The Bayesian school eventually regained momentum, largely due to the enduring influence of Jeffreys’ methodological approaches, which persisted almost like folk remedies passed down through generations.

Bayes’ theorem teaches us how to act.

When Bayes’ theorem is applied to investing, we observe a close alignment between Warren Buffett’s portfolio strategy and Bayesian principles. Buffett’s investment approach emphasizes concentrated investing—selecting a small number of companies with strong earnings power and clear economic moats. This strategy resonates deeply with the Bayesian concept of dynamically adjusting judgments based on prior information and new evidence.

Bayes’ theorem is a method of probabilistic reasoning that allows one to update the estimated probability of an event occurring in light of new evidence or information. In Buffett’s investment decisions, the prior probability can be viewed as an assessment grounded in a company’s historical performance and financial data, while new evidence encompasses recent developments in the company’s operations, market feedback, and management decisions.

At the heart of Buffett’s investment methodology is the idea that if a company has already demonstrated a strong competitive advantage and consistent profitability (a high prior probability), then any new information about its future prospects (the likelihood) can be used to refine the investment judgment, yielding a more confident decision (the posterior probability). This approach aligns precisely with the core tenet of Bayes’ theorem—updating probability estimates in response to new evidence. Through this method, Buffett constructs a portfolio anchored in businesses characterized by high predictability and sustainability, which exemplifies the power of applying Bayes’ theorem to investment decision-making.

In the book *The Warren Buffett Portfolio*, Robert Hagstrom notes that Bayes’ theorem teaches us a logical framework for understanding why, among many possibilities, only one outcome ultimately occurs. Conceptually, this involves a straightforward procedure: we first assign a probability to each possible outcome based on the evidence at hand, and as additional evidence emerges, we adjust these probabilities to reflect the new information.

A prime example is Buffett’s reasoning process regarding Coca-Cola. Buffett had been familiar with Coca-Cola since childhood, actively buying bottles from stores and reselling them for profit. Right up to the moment he purchased Coca-Cola stock, the company had never left his field of vision.

At the time, Coca-Cola suffered from business diversification that led to fragmentation, an excessive burden of inefficient assets weighing down its core operations, and strong growth by competitors eroding its market share, resulting in persistently weak financial performance and stock price. However, given Coca-Cola’s extensive operational history and nearly a century of accumulated performance data—illustrated clearly in frequency distribution charts—its brand strength and intrinsic value remained intact despite poor management. This constitutes the first step of the reasoning process.

Buffett observed that Coca-Cola’s new management team was undertaking initiatives to enhance corporate value—actions that increased the probability of appreciation—such as divesting underperforming businesses and proactively exiting non-core operations. Furthermore, proceeds from these divestitures were being reinvested into the company’s core, more profitable businesses. This second step in the reasoning process convinced Buffett that the company’s performance and financial condition would soon improve under the new management’s stewardship. Additionally, while enhancing operational efficiency, Coca-Cola’s management actively repurchased shares, thereby further increasing the firm’s economic value. This represents the third step of the reasoning process.

In this three-step analysis, each stage incrementally increased the probability of profitability from investing in Coca-Cola. The information Buffett used did not emerge all at once but rather unfolded progressively. Therefore, in accordance with Bayes’ theorem, each new piece of information raised the degree of certainty, simultaneously reducing investment risk and enhancing expected returns.

As Buffett summarized: “Investing is fundamentally a game of probabilities, and Bayes teaches us how to update our ignorance with new information.” This capacity for dynamic cognitive updating is precisely the core weapon enabling him to navigate economic cycles successfully. Charlie Munger also remarked: “Bayes’ theorem teaches us that the essence of wisdom lies in acknowledging ignorance—when presented with new information, one must have the courage to say, ‘There is an X% chance my prior conclusion was wrong.’” This mechanism of continuous cognitive revision underpins one of the key reasons they consistently outperform average market returns. Bayes’ theorem essentially combines “past experience” with “new evidence” to yield a “revised judgment”—using empirical evidence to refine theoretical understanding. The primary objective of Bayesianism is to distinguish useful models from ineffective ones.

Today, Bayesianism has become increasingly pervasive in economics. However, statistician Leonard Savage noted in *The Foundations of Statistics* that applying Bayesian decision theory is only rational within a “small world.” The realms of macroeconomics and advanced finance are among the least likely to qualify as small worlds. The distinction between small and large worlds is crucial: in a small world, individuals can solve problems by maximizing expected utility; in a large world, however, people actually live and operate.

In this complex universe, no knowledge is absolutely certain—whether due to the physical nature of reality, insufficient empirical data, the presence of chaotic phenomena, or limitations in our computational capabilities. Charles Darwin long ago asserted that species with defects preventing reproduction inevitably face extinction; consequently, among species that survive today, major flaws are exceedingly rare. The same holds true for theories. Thus, we must immerse ourselves in passion, fascination, and inquiry.

Editor/Jayden

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to EleBank. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.