Erez Katz, CEO and Co-founder of Lucena Research

At Lucena, we always try to understand the root cause of unexpected results and pull actionable insights from the data. Sometimes it’s easy to blame the machine when actually it did exactly what it was tasked to do, but the user failed to communicate deep learning goals sufficiently. In many cases, a failed research outcome is due to humans not stating the problem properly. In machine learning lingo, our strategy fails due to lack of defining a representative objective function. Or we fail to label the training data to support our deep learning strategy.

Deep neural networks learn by minimizing error which is the difference between a desired outcome and its point-in-time outcome. An objective function, y = f(x), is a mathematical representation of the input variables computed to achieve our desired outcome.

 

In Deep Learning, Extracting The Right Insights Hinges On Several Key Steps

 

To illustrate, here are two examples that were brought to me by Tucker Balch Ph.D., co-founder and Chief Scientist. Tucker teaches quantitative investment at the Georgia Institute of Technology and online at Coursera and Udacity.

 

Know Your Data’s Source

A student produced a deep-learning based price forecast backtest of stocks in the S&P with average annual returns greater than 80%. These results are pretty impressive and of course lead to speculation.

One of the most obvious causes of such unrealistic performance would normally be in-sample backtesting. Basically, the model “is cheating” by backtesting using “familiar” data. This normally happens when the backtest is conducted on the same data that was used to train the model.

For context, here is a backtest result using Lucena’s Price Forecaster (available on QuantDesk). This backtest simulates selecting the top 20 constituents every week from the S&P. As is shown here, a significant outperformance can be achieved with annualized returns of 12.45%, making an 80+% average annual return for an equivalent backtest quite remarkable.

 

QuantDesk forecast backtest

Knn forecaster backtest selecting top 20 securities from the S&P with the highest projected one-week return coupled with the highest confidence score. Results account for slippage and transactions cost.
Past performance is not indicative of future returns.

 

Surprisingly, a review of the student’s model didn’t indicate overfitting or forward knowledge. So what caused such an outstanding backtest performance?

The student was using features and price history from the Wharton School of Business (CRSP U.S Stock Data). This resource is available for education, and in the context of such research Wharton inserted artificial symbols with price history that follow a certain predictive pattern. The student was able to exploit such patterns by singling out the synthetic symbols to drive the backtest’s performance.

  • Data validation is crucial in the context of machine learning. Read more about why you should use deep learning to validate alternative data.
  • Given that the synthetic data was blended with real data, the results of the student’s backtest are quite impressive as the deep learner was able to distinguish information from noise and exploit it. This tells us, that if there is information in the data (even if it is obscured and thinly represented), the deep learner is capable of extracting it.

 

Ensure Your Strategy & Objectives Are Set

In Professor Balch’s class, students are asked to design a Q-Learner (a model-free reinforcement learning algorithm developed originally by Google Brain) that can master the game Lunar Lander. The player or Q-Learner attempts to navigate landing a spaceship on the moon and land between two flags on the ground.

The Q-Learning algorithm is perfect for mastering games since it is able to learn the rules of an unfamiliar environment through trial and error without a human-generated model. The application of reinforcement learning for games is appropriate since it strives to form policies that maximize rewards. In short, Q-Learning attempts to achieve the policies that drive the highest reward through trial and error.

You can watch here to see how an attempt to further increase the model’s accuracy resulted unexpectedly in a spaceship that never lands. It turns out that increasing the penalty (negative reward) of the Q-Learner facilitated a policy of never even trying to land since the risk is simply too high. By adjusting the math and disregarding the true business objective we are destined to get a model that satisfies the science but doesn’t quite achieve a commercially viable solution.

 

Deep Learning Takeaways:

The above examples show how the addition of ANNs (artificial neural networks) can lead to unexpected results. So with their addition, how do we set up for success?

  • Data validation is crucial in the context of machine learning. Read more about why you should use deep learning to validate alternative data.
  • It’s imperative to accurately state the objective of your research by defining a representative objective function.
  • Lastly, ensure your valid data is representative of real-world scenarios and label the data (tell the machine what data yields a positive outcome).

For example, if we wish to train neural networks to aid in profitable investments, training them to identify positive returns with high accuracy may not be sufficient. The model will most likely train for many small future returns vs fewer but more sizable returns. Small returns are more likely to occur, but in reality, investments based on such forecast will most likely fall short of overcoming transactions cost and slippage and thus cannot be used effectively for trading.

Want to learn more about how QuantDesk uses alternative data to achieve winning investment strategies? Here’s an overview of our flagship machine learning platform. Still have questions? Let’s chat.

 

Join Erez for an upcoming Webinar: “Making Deep Neural Nets Work for Forecasting Stock Prices”

Register Now

 

Liked this post? Read more about similar topics:

 

Beat The Odds Using A Short Only Investment Strategy

How Convolutional Neural Networks Can Be Used In Stock Market Predictions

How To Use Deep Neural Networks To Forecast