In 1963 Benoit Mandelbrot published an article called “The Variation of Certain Speculative Prices.” It is a response to the forming theory that would become Modern Portfolio Theory. Oversimplified, Mandelbrot’s argument could be summarized as “if this is your theory, then this cannot be your data, and this is your data.” This issue has haunted models such as Black-Scholes, the CAPM, the APT and Fama-French. None of them have survived validation tests. Indeed, a good argument can be made that the test by Fama and MacBeth in 1973 should have brought this class of discussion to an end, but it didn’t. I am going to argue that each of these models shares a mathematical problem that I believe has previously gone unnoticed. The solution I have found to this problem has been to construct a new stochastic calculus for this class of problems.
The structure of this paper is first to explore the properties of estimators that may not be obvious to an economist or even a data scientist who is not working in their own domain. The second part is to examine why models such as the CAPM lack a Bayesian counterpart. The third is to investigate the issue of information and its role in creating a new calculus.
Each of the mean-variance models, as well as the APT and the Fama-French model, is built on top of Frequentist axioms. This article is not an attack on Frequentism, but simply an observation about it. I say this because there are cases in probability and statistics where the choice of axioms will also determine the asserted value of the solution from the data. That, in part, appears to be the case here. Before getting into the math in any technical sense, I will provide two examples. One of these is Bayesian versus Frequentist, but the other is Frequentist but with differing assumed loss functions. The purpose is to illustrate the sensitivity of a result to the axioms and assumptions in use.
For the first illustration, consider a wheel marked with numbers; a roulette wheel will do. Think of this as an inverse game of roulette with information. The gamblers cannot see where the ball falls and they place their gambles only after the croupier observes the number.
This example is relatively common in decision theory. After the ball lands and a number is chosen two fair coins are tossed. If a coin comes up heads, then the croupier will reveal to the gamblers the value one unit to the left of where the ball landed. If the obverse comes up, then the croupier will reveal the value of one unit to the right of where the ball landed. This creates a sample space with three possible outcomes, {L,L}, {R,R} and {L,R}. Our concern is the optimal action to take regarding choosing a number to place a gamble on.
Let us assume the number came up 17. The signals from the croupier will be either {16,16}, {18,18} or {16,18}. It is obvious that if it is {16,18}, then you must bet on 17. However, the Bayesian solution and the Frequentist solution do not match otherwise. Both would choose 17 in the case of {16,18}, but they differ in the case where the numbers given by the croupier are the same.
The minimum variance unbiased estimators in the case of either {16,16} and {18,18} are 16 and 18 respectively. In the case of {16,16}, the Bayesian probability model supports either element of the solution set {15,17} equally. For the {18,18} case the same support is found for {17,19}. It does not minimize the variance, but it does maximize the winning frequency. In expectation, the Bayesian gambler would win 75% of the time, while the Frequentist would win 50% of the time. Note how close this discussion is to a discussion of martingale.
This illustration shows two crucial facts. The first is that the choice of axiom systems can determine the course of action, and not agree with the decision function of the other system. The second is an illustration of the Dutch Book Theorem.
The Dutch Book Theorem is similar to the no-arbitrage assumption but weaker in its base assumptions. However, it has an unexpected result. You can always use Bayesian methods for gambling, and you cannot use Frequentist methods for gambling. Models such as Black-Scholes and the Capital Asset Pricing Model are instructions on how to gamble in a specific type and set of lotteries. They are built on Frequentist axioms. Now imagine an economist testing the above game where nobody is behaving as he or she are supposed to act under the MVUE. Economists may reject the model or may argue that people are behaving irrationally, but really it is because of the axioms used, not the behavior. Any large-scale test of behavior would come out “wrong.”
A market-maker or bookie using Frequentist rules would also suffer, not just the economists. Long-run optimal behavior would grant 1:1 odds. Market makers using Ito models should, from time to time, have people eat their lunch. It is no wonder hedge funds abound. Dangerously, the eleven trillion dollars in outstanding over-the-counter options premiums are mispriced if they are built on Ito methodologies. By theorem, anything built on an Ito methodology will be systematically mispriced, even if every assumption is valid.
The second example originated in a paper by Welch in 1939 and was expanded on by Morey et al., in their article “The Fallacy of Placing Confidence in Confidence Intervals.” Their paper was on intellectual fallacies people make, serious and common ones, in using confidence intervals. They also describe Bayesian credible intervals, although the Bayesian case is not essential here.
In the story, a submarine sinks to the bottom of the ocean and rescuers mount an effort to save the crew. Unfortunately, time is running out, and there is only going to be one chance at a successful rescue. Fortunately, there is a statistician present, and that statistician can construct a confidence interval as to where the rescue hatch is. Unfortunately, there exist an infinite number of possible confidence intervals to choose from. The fact that they don’t match raises questions about which procedure to choose. Morey et al. describe three procedures plus a Bayesian one.
Their description of the problem is:
A 10-meter-long research submersible with several people on board has lost contact with its surface support vessel. The submersible has a rescue hatch exactly halfway along its length, to which the support vessel will drop a rescue line. Because the rescuers only get one rescue attempt, it is crucial that when the line is dropped to the craft in the deep water that the line be as close as possible to this hatch. The researchers on the support vessel do not know where the submersible is, but they do know that it forms two distinctive bubbles. These bubbles could form anywhere along the craft’s length, independently, with equal probability, and float to the surface where they can be seen by the support vessel.
One thing to note is that if the bubbles are precisely 10 meters apart, the location of the rescue hatch is known with perfect certainty, while if there is no distance between the bubbles, then the hatch must be within plus or minus five meters. The possibility of using a xy-plane rather than just an x-axis is ignored in this example.
A couple of facts are relevant here. The first is that the definition of a confidence interval is that it covers the parameter at least a certain fixed percentage of the time upon infinite repetition. The second is that any function that covers the parameter at least that often is a valid confidence procedure. The third is that the procedure needs good long-run features and does not consider the likelihood so is not conditioned on the current specifics. Fourth, the size of the likelihood is 10-d, where d is the distance between the bubbles. Finally, because the sample size will only ever be two (n=2) and a narrow interval is desirable, the statistician chooses a fifty percent interval rather than the more traditional ninety-five percent interval.
The first procedure considered by the statistician is to add or subtract approximately 1.46 to the average location of the bubbles. Because the width of the submarine is fixed and the sampling distribution of the mean is the triangular distribution, adding or subtracting five minus five divided by the square root of two guarantees that coverage will occur at least 50% of the time.
The second procedure is non-parametric. Noting that twenty-five percent of the observations must be within d/2 of the median, taking the median plus or minus d/2 is also a fifty percent confidence procedure. That also coincides with the solution under the Student t-distribution for n=2. So this is also the most common procedure taught to undergraduates.
The third confidence procedure is to take the inverse of the uniformly most powerful test. If d<5, then use the non-parametric method of the mean plus or minus d/2, otherwise use the mean plus or minus five minus d/2.
Each of these procedures covers the hatch fifty percent of the time, but are any of these procedures appropriate?
If the bubbles are nine meters apart, then there is only a one-meter range the hatch could be in, which is also the likelihood function. The first procedure covers it with a width of 2.92 meters, the second at nine meters, and the third at one meter. All three cover the likelihood one hundred percent of the time, though the first two are wider than necessary and depending on the precision required may cause failure unnecessarily.
On the other hand, if the bubbles are one meter apart, then coverage of the first procedure is still 2.92 meters wide, the second and third are one meter wide. The likelihood is nine meters wide. From a Bayesian perspective, the first procedure has roughly a thirty-two percent chance of covering the hatch, while the latter two procedures have roughly an eleven percent chance of covering the hatch. The final two procedures look the most accurate when they have the least amount of information on the true location.
The lesson, however, isn’t that one should use a Bayesian procedure. The lesson is that confidence intervals are the result of minimizing some loss function. Each of these procedures minimizes a different type of loss. Confidence procedures do not measure the accuracy of a result, nor do they give a probability a result is in a range. They provide a frequency with which a parameter will be covered as the number of repetitions becomes arbitrarily large.
Loss functions are vital to this newly proposed calculus. Losses are also subjective. From the crew of the submersible’s perspective, this is an all-or-nothing loss function, and the properties upon repetition do not matter to them. From their perspective, they do not care that at least fifty of one hundred crews are saved, they only care if they are saved this one time. On the other hand, the financial outcomes of the corporation running the rescue may create differing considerations beyond the immediate risk to life and limb. They do, rationally, care about the long-run properties of the procedure used as well as the loss of life, any specific compensation scheme such as insurance and the loss structure created by a failed estimate.
It is unlikely that any of the stated procedures meet the needs of the corporation. Because the long-run only applies to the crew if they are rescued and provided any spouses they may have allows them to go to sea again, a Frequentist procedure may not meet the needs of the crew. When building financial models, the subjective loss structure is usually swept under the rug. The proposed calculus forces a review of the outcomes when the statistician or economist make an incorrect estimate.
The failure to consider the purpose of models such as Black-Scholes in gambling and the failure to account for a proper loss function makes the utility and the appropriate evaluation of these financial models doubtful. A good part of this may have been that these models became more important than the authors likely intended. In a sense, economics took them too seriously, especially since they lack validation.
The second overall purpose of this essay is to look at why there is no Bayesian counterpart to these models. While it has been the case from time to time that researchers have used Bayesian methods to test these models, there is a problem with doing this with Frequentist models. As with the roulette example above, when constructed in a different paradigm, the two models make two different predictions. To examine a Frequentist model with Bayesian methods may be to check the wrong predictions. After all, a Bayesian check that the bookies should offer 1:1 odds would fail for the Bayesian just as readily as for the Frequentist, but the proper Bayesian prediction isn’t for 1:1 odds. A Bayesian methodology requires a complete rebuilding of the model. That is where a calculus problem begins to happen.
Ito methods are Frequentist methods and assume the parameters are known with certainty. It follows the thinking of the null hypothesis. When one asserts a null hypothesis one is asserting, with perfect certainty, the true value of the parameters. The difference between the modeling and testing is that the experiment is built with the intent to reject the null. For the modeling to hold, the assumption of complete information on the parameters turns out to be very important.
The importance shows up when it is realized that parameters are random numbers in Bayesian thinking, so one must assert that one does not know the parameters. Knowledge of the true value of the parameters is a big deal. That is a lot of information about the world. If that assumption is dropped, the calculations also have to account for the uncertainty in the parameters and not just the uncertainty from the chance variable. That is a different class of problem. The Capital Market Line vanishes from existence with that added uncertainty.
To see why one can consider the intertemporal relationship between the present value of wealth and the future value of wealth in the CAPM. Its equation is that the future value is equal to the present value times a reward for investing plus a random shock to the appraised future value. While it is commonly presumed to be a normal shock, it doesn’t matter as long as the shock has a center of location of zero and a finite variance. Why it doesn’t matter is that is known in Frequentist statistics that value of the reward in this equation has no estimator that converges to the population parameter that also is consistent with mean-variance finance. If the CAPM parameters are not known with certainty, then they cannot be estimated with an estimator consistent with the theory. Median or quantile regression will produce an estimator, but not a mean-based estimator.
On the other hand, a different problem exists for a Bayesian model. It is common for economists using a Frequentist model to treat returns as data. Indeed, they are studied as data. However, in the Bayesian paradigm, they are a function of prices and volumes. In reality, in the Frequentist paradigm, they are as well, but the models treat them as primitive constructions.
So, if returns are the product of the ratio of prices times the ratio of volumes minus one, then returns are a statistic. As such, its distribution needs to be derived from the distribution of the prices and the distribution of the volumes. It also puts it in line with the rest of economics where the discussion is on prices and quantities and not returns.
Because stocks are sold in a double auction, there is no winner’s curse and as such the rational behavior is to bid the expected value for going concerns. Using the standard mean-variance assumption of many buyers and sellers, the distribution of expectations will converge to the Gaussian distribution as the number becomes large enough. If the equilibrium price is treated as (0,0), then the ratio of two normal densities is the Cauchy distribution, which has no first moment. As a consequence, models like the CAPM are impossible since there is no mean or variance for mean-variance finance to operate in. It also takes apart the Fama-French model because beta does not exist and least squares regression never converges to the population parameter.
Because the density is truncated at -100%, the center of location is the mode. That is a very different conceptualization of regression. It also has implications for artificial intelligence. AI models are function approximators, but the danger is that they approximate the least squares solution rather than a Bayesian modal solution. Many standard minimizations in machine learning and AI are guaranteed to miss the Bayesian solution by their construction.
That also brings up an information problem. The Cauchy distribution and the truncated Cauchy distribution lack points that are sufficient statistics for either parameter. While their pivotal quantity is sufficient and normally distributed for the Cauchy distribution, it is possibly not for the truncated Cauchy distribution, and while conditioning on an ancillary statistic makes the inference valid, this is not useful for projection.
The proposed calculus solves this in the Bayesian model by noting that the posterior predictive density contains the effect of the entire posterior in each point via marginalization. There is no information loss possible using the predictive distribution. It conjectures two possible solutions for the Frequentist calculus. It is a conjecture for two reasons.
The first is that they are built on Frequentist predictive intervals, which are built on confidence procedures. As seen above, confidence procedures are not unique, and a change in assumptions would intrinsically change the distribution of predictions, even though the data is unchanged. I left it as a conjecture so that measure theorists could tear it apart. The Bayesian method is not a conjecture, however. The second is that the initial conjectured method depends upon open intervals and many standard results collapse without compactness.
The new calculus differs from the Bayesian decision theory that it is constructed on in that it constructs an objective estimator using the indirect utility function. Whereas Bayesian decision theory proper is purely subjective, this provides a solution to arrive at an objective solution, subject to the information in the prior density. It can also be gambled upon. I hope to put an options pricing model in this blog soon as well. It is complete, but I am editing it to fit the calculus better.
The full paper can be found at the social science research network.
All criticism of the paper is welcome. I hope you enjoyed my first ever blog post, if it is possible to enjoy a post on economics.
Bibliography
Fama, E. F. and MacBeth, J. D. (1973). Risk, return, and equilibrium:
Empirical tests. The Journal of Political Economy, 81(3):607-636.
Mandelbrot, B. (1963). The variation of certain speculative prices. The
Journal of Business, 36(4):394-419.
Morey, R., Hoekstra, R., Rouder, J., Lee, M., & Wagenmakers, E.-J. (2015). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 1–21.
Welch, B. L. (1939). On confidence limits and sufficiency, with particular reference to parameters of location. The Annals of Mathematical Statistics, 10(1), 58–69.