Unless you’ve been living on another planet, you would have come across experts musing about the potential of technology to revolutionize finance. To be sure, some of the new technologies are indeed game changers – stuff like e-payments, block-chain and cyber-security. These advances are important, but they are kind of mundane. Investors are more excited about the prospect of software so powerful that can comb through tons of financial data, pick up interesting patterns and make accurate **predictions** about future asset returns. If and when that happens, we will surely have arrived at the gilded age of robo-assisted predictions.

But this is a big “if” and this blog explains why. To do that, we’ll go for a “fishing trip”, where the “fishes” we’re after are a handful of influential **predictors** of stock returns, out of a sea of possible predictors.

Suppose you want to predict tomorrow or next month’s stock price using a bunch of predictors. There are a multitude of possible predictors for the stock market including things like interest rates, bond yield spreads, inflation expectations, industrial output and GDP numbers and so forth.

So you can see that the first challenge of making good predictions is how to separate good predictors from bad ones. This turns out to be a daunting challenge even with modern technology.

Let’s say you want to fish out 13 of the best predictors. By “best”, I mean having the best linear fit (something like Y = aX1 + bX2 + cX3 + etc), which is called a **linear regression model**. Here the X’s are the set of predictors, and the “a’s”, “b’s” and “c’s” are sensitivity coefficients that measure how a change in a particular X changes Y. For stock market predictions, Y is naturally the rate of return on a stock or a stock index.

I need to emphasize that your model must show that it has a good** in-sample fit **(i.e., based on *past data*) before we can even talk about prediction. Statisticians measure a model’s goodness-of-fit using a number called **adjusted R-square**. Never mind how adjusted R-square is computed (Excel can easily do it). All you need to know is that th adjusted R-square ranges from zero (the model doesn’t explain stock returns at all ) to one (a perfect fit). From my research experience using annual stock returns, models with adjusted R-squares of more than 0.3 are rare, which is a reminder than stock returns are pretty random or not systematically explained by observed fundamentals.

Going back to fitting problem, here’s the key question: how many regression models must you run to find the best 13 predictors out of 100?

Hold your breath.

Answer: 7,110,542,499,799,200

A one followed by 15 zeroes is called 7 quadrillion. So this number is about 7 quadrillion. How big is that? For comparison, there 100 billion stars in our galaxy. Hence, if this number represents another galaxy, that galaxy has 71,105 times more stars than ours. The number gets even bigger if we start out with 1,000 potential predictors instead of just 100, so large in fact that it takes a computer which can perform 10 million calculations a second more than 22 years to finish the task! For all practical purposes, this type of search problem is intractable. Computer scientist call such problems **NP-complete** (where NP stands for non-deterministic polynomial time). NP-complete problems are those that are too hard for today’s computers and possibly those in the foreseeable future.

It gets worse. So far, we’ve only been concerned with finding the best *k* out of *m* predictors using linear regression based on *past data*. Even if you manage somehow to find the best *k* predictors, it doesn’t mean that you’ve found a perfect crystal ball for prediction. Why? Because the world keeps changing. As they say, “past results do not guarantee good future performance”. Hence, your best prediction model using past future may churn out lousy predictions for next month’s stock returns. You need keep learning and updating your model to cope with new market conditions. That is a tough call. It is hard enough to find the best fit model using past data. To have the best *forward-looking* model at all times is pure fantasy even for the most sophisticated machine learning software available today. The best we can do with large problems like these is to use **heuristics **or shortcuts but the problem with shortcuts is that you can never be sure whether the resulting ad-hoc model is robust and accurate.

So, the next time you hear someone claiming to have a state-of-the art robo-advisor that can accurately predict stock returns, you may want to ask if he knows what a quadrillion is 🙂