"So, if you just are a frequentist probablist..."

I heard this quote one too many times and it made me want to check (and it’s your fault, @Compounding :stuck_out_tongue_winking_eye:). Could the equity premium really be so undefined that statistics only tell you it is between 10% or 1.5% (with 95% confidence)?

When I talk to investment advisors, I love to ask them, “What is the equity premium?” I get a range of answers, but almost everyone thinks it’s somewhere between 4 and 7% and 6% is what I hear most. Then when you ask people where they come to that number, they say, “Well, you look at the last 100 years and it’s been 6%.”

That’s true. If you look at 100 years of data, you see a 6% equity premia, but if you get that mean equity premia by running the regression, it also gives you a confidence bound on your estimate. It turns out that the estimated equity premium that you get by looking at the past data is 6% plus or minus the standard deviation of like two and a quarter percent.

So, if you just are a frequentist probablist, you do the frequentist statistics, you’re basically saying you’re 95% sure that the true equity premium is between one and a half percent and 10 and a half percent. You just have no idea. With 100 years of data, we can’t come close to agreeing on what the equity premium is.

Professor Robert Novy-Marx
on Rational Reminder Podcast

Now, I don’t claim I’m cooler than a bona fide professor, but with a century of nearly daily closing data, this got to be more precise.

So, I ran my own regression in Libreoffice Calc (opensource Excel). I set it up according to this tutorial. I only changed the calculation of t, so it would actually take df instead of a hardcoded 13. I verified that I got the same numbers as on the tutorial screenshot. The tutorial also uses 95% confidence, so that should be appropriate.

Then I took the monthly real total return of the US stock market from https://shillerdata.com/. The dataset spans from 1871-01 to 2024-03. That contained 1839 months. Assuming exponential growth, I applied the log-function to all prices. With this I can then run the simple linear regression from the tutorial.

I undid the logarithm by exponentiating again on the mean, high and low predictions. Then I just calculated:

  • mean=last\_mean/first\_mean
  • max=last\_high/first\_low
  • min=last\_low/first\_high

As a last step I annualized them:

  • f(r)=r^{\frac{365}{55942}}

I get:

  • mean=6.62\%
  • max=7.51\%
  • min=5.74\%

And those are even slightly too far apart, since tangents should have been used.

Here a log plot. The predictions outside the data interval don’t even visibly diverge anymore because there are so many datapoints.

I also tried with just yearly data, but the picture doesn’t change much.

I can’t find the 2.25% standard deviation. So, what did he mean here?


I am pretty sure that they are talking about the results of the following procedure:

  • take yearly returns
  • find the mean value and the standard deviation :joy:.
1 Like

Perhaps time horizons?
Looking at “yearly over 1 year” is probably quite different than “yearly over 10/20/30/50 years”.

I’m not sure I understand all that you did but:

If you want to calculate the equity premium, you should take nominal returns data and deduct the risk free rate.


Good catch, brainfart on my side. But my guess is, that it won’t widen the spread much. Would probably change the mean, though.

I thought about this, too. But this is even more nonsensical. No amount of data will narrow that standard deviation. A 95% confidence interval needs to include 95% of all values. If the yearly return of equity is dispersed, so must the standard deviation.