QuantDesk® Machine Learning Forecast
for the Week of April 30th, 2018
Using Timeseries Data To Forecast Stock Prices Using Deep Neural Networks
by Erez Katz, CEO and Co-founder of Lucena Research.
In the past decade, the most successful implementations in deep neural networks (deep learning) centered around image processing. Handwriting recognition, natural language processing, speech recognition, and computer vision research are all predicated on some visual representation of the training data on which the network attempts to categorize and generalize a function -- f(x) = y. In other words, a shape (x) and its corresponding outcome (y). By identifying the unique properties of a visual representation, the network is able to classify within a certain proximity close matches in new, unseen data with the expectation that the outcome from such data will be similar to what the network was trained on.
Take, for example, a handwritten image of the number eight (8). The problem of recognizing 8 is not as obvious as it seems, since our brains are able to recognize 8 with ease. However, in order for the machine to learn how to recognize images and ultimately exceed the accuracy and speed of humans, it needs to deconstruct the image to its basic form. Below is an example of a 28*28 pixel image of an 8. On the right side of the image you can see the input layer of a neural network that assigns a neuron for every pixel’s value, 784 input neurons in total (28*28).
If every 8 was written exactly as depicted, we wouldn’t need a sophisticated network with which to interpret it. But the problem of recognizing images get exponentially more complex since there are many ways to write an 8, not to mention the image can appear in various regions of the 28*28 pixels’ space and is not always perfectly centered, as depicted here.
Convolutional Neural Networks
One successful deep learning architecture used in the context of image recognition or more specific computer vision is convolutional neural networks (CNN). CNNs exploit translational invariance by extracting features as regions in an image also called receptive fields. More specifically, CNNs view images in the form of spatial representations. To put this in the context of our figure 8 example, rather than looking at the image through a pixel by pixel representation, CNNs allows the measurement of groups of pixels together, as regions, no matter where they appear in the image space.
An elegant presentation of CNN’s can be found here.
Come to think of it, the notion of spatial reference actually bodes well for the timeseries data we collect at Lucena. We hold approximately 850 timeseries features that represent daily states of various securities over time (equities, FRX, futures, cryptocurrencies, etc.). For example, we collect the social media sentiment score for each stock in the Russell 1K, daily. In practice, the social media sentiment score for a given stock is nothing more than a timeseries representation that can be graphed over time, just like a price of a stock. With such a graphical representation we can provide a richer spatial training reference to a convolutional neural network learner, compared to providing merely a single point-in-time value. Intuitively you must agree that the formation of a trend over time has more information compared to just one value as a point in time.
Applying The Image Recognition Concept To Equity Price Forecasting
Let me start by making a bold statement: Neural networks can compute any function! No matter how complicated and wiggly its graphical representation is, it’s guaranteed to be solved by a neural network. If you want visual proof, look here. Now, imagine you have a stock price timeseries that is a representation of a sine wave (let’s assume its symbol is XYZ, for reference).
We want to test and see if we can derive a compelling forecasting model using neural nets that can accurately predict XYZ’s 21-day price return. Chad Landis, one of our rising quant stars, constructed a simple neural net model with no hidden layers and trained it with 5,000 epochs (an epoch is a single set of data served as an input to the neural network so that it can repeatedly train its weights and biases in order to get as close as possible to the desired label output).
Chad used simple logistic regression (similar to a neural network with no hidden layers) with a rolling 21-day mean value. After the training dataset of the first 5,000 epochs, Chad ran the model against the validation period of the subsequent 5,000 epochs and was able to easily achieve validation accuracy of 99.86%, with precision in excess of 99.96%. To put this into context, in theory, if we were to trade XYZ by inputting into the model its rolling 21-day price mean, XYZ’s performance could have looked like the chart below:
Now, before you go and put your down payment on a small island in Hawaii, let’s understand why the results above are unachievable. For one, I don’t believe any stock in the universe moves in a perfect sine wave formation. Secondly, if solving a price forecaster problem was that easy, it would have already been solved and exploited by the masses. Unfortunately, there’s nothing novel here.
However, there are a few important takeaways:
- Timeseries data can be used effectively by convolutional neural networks when transformed into visual representations.
- If there is information in a timeseries data, it can be successfully exploited by representing it as a normalized graphical representation. Furthermore, with a sound CNN model, such data can ultimately provide sufficient advantage for profit.
- Since neural nets can solve any equation, if there is information in the data that can be represented by f(x) = y, no matter how complex it is, it could ultimately be exploited and used for effective forecasting by the proper neural network model.
In preliminary tests of passing timeseries point-in-time data to deep neural networks, we were able to achieve validation accuracy of approximately 52%. However, by applying a graphical transformation representation of the features’ timeseries data, we are now experiencing more than 60% validation (out of sample) accuracy, which is very exciting for us but still requires a significant amount of additional research and validation.
There is more to come as we advance our research further.
As in the past, we will provide weekly updates on how the model portfolios and the theme-based strategies we cover in this newsletter are performing.
Tiebreaker has been forward traded since 2014 and to date it has enjoyed remarkably low volatility and boasts an impressive return of 50.16%, low volatility as expressed by its max-drawdown of only 6.16%, and a Sharpe of 1.66! (You can see a more detailed view of Tiebreaker’s performance below in this newsletter.)
BlackDog – Lucena’s Risk Parity - Since Inception 42.27% vs. benchmark of 20.69 %
We have recently developed a sophisticated multi-sleeve optimization engine set to provide the most suitable asset allocation for a given risk profile, while respecting multi-level allocation restriction rules.
Essentially, we strive to obtain an optimal decision while taking into consideration the trade-offs between two or more conflicting objectives. For example, if you consider a wide universe of constituents, we can find a subset selection and their respective allocations to satisfy the following:
- Maximizing Sharpe
- Widely diversified portfolio with certain allocation restrictions across certain asset classes, market sectors and growth/value classifications
- Restricting volatility
- Minimizing turnover
We can also determine the proper rebalance frequency and validate the recommended methodology with a comprehensive backtest.
Forecasting the Top 10 Positions in the S&P
Lucena’s Forecaster uses a predetermined set of 10 factors that are selected from a large set of over 500. Self-adjusting to the most recent data, we apply a genetic algorithm (GA) process that runs over the weekend to identify the most predictive set of factors based on which our price forecasts are assessed. These factors (together called a “model”) are used to forecast the price and its corresponding confidence score of every stock in the S&P. Our machine-learning algorithm travels back in time over a look-back period (or a training period) and searches for historical states in which the underlying equities were similar to their current state. By assessing how prices moved forward in the past, we anticipate their projected price change and forecast their volatility.
The charts below represent the new model and the top 10 positions assessed by Lucena’s Price Forecaster.
The top 10 forecast chart below delineates the ten positions in the S&P with the highest projected market-relative return combined with their highest confidence score.
To view a brief video of all the major functions of QuantDesk, please click on the following link:
The table below presents the trailing 12-month performance and a YTD comparison between the two model strategies we cover in this newsletter (BlackDog and Tiebreaker), as well as the two ETFs representing the major US indexes (the DOW and the S&P).
For those of you unfamiliar with BlackDog and Tiebreaker, here is a brief overview: BlackDog and Tiebreaker are two out of an assortment of model strategies that we offer our clients. Our team of quants is constantly on the hunt for innovative investment ideas. Lucena’s model portfolios are a byproduct of some of our best research, packaged into consumable model-portfolios. The performance stats and charts presented here are a reflection of paper traded portfolios on our platform, QuantDesk®. Actual performance of our clients’ portfolios may vary as it is subject to slippage and the manager’s discretionary implementation. We will be happy to facilitate an introduction with one of our clients for those of you interested in reviewing live brokerage accounts that track our model portfolios.
Tiebreaker: Tiebreaker is an actively managed long/short equity strategy. It invests in equities from the S&P 500 and Russell 1000 and is rebalanced bi-weekly using Lucena’s Forecaster, Optimizer and Hedger. Tiebreaker splits its cash evenly between its core and hedge holdings, and its hedge positions consist of long and short equities. Tiebreaker has been able to avoid major market drawdowns while still taking full advantage of subsequent run-ups. Tiebreaker is able to adjust its long/short exposure based on idiosyncratic volatility and risk. Lucena’s Hedge Finder is primarily responsible for driving this long/short exposure tilt.
Tiebreaker Model Portfolio Performance Calculation Methodology Tiebreaker's model portfolio’s performance is a paper trading simulation and it assumes opening account balance of $1,000,000 cash. Tiebreaker started to paper trade on April 28, 2014 as a cash neutral and Bata neutral strategy. However, it was substantially modified to its current dynamic mode on 9/1/2014. Trade execution and return figures assume positions are opened at the 11:00AM EST price quoted by the primary exchange on which the security is traded and unless a stop is triggered, the positions are closed at the 4:00PM EST price quoted by the primary exchange on which the security is traded. In the case of a stop loss, a trailing 5% stop loss is imposed and is measured from the intra-week high (in the case of longs) and low (in the case of shorts). If the stop loss was triggered, an exit from the position 5% below, in the case of longs, and 5% above, in the case of shorts. Tiebreaker assesses the price at which the position is exited with the following modification: prior to March 1st, 2016, at times but not at all times, if, in consultation with a client executing the strategy, it is found that the client received a less favorable price in closing out a position when a stop loss is triggered, the less favorable price is used in determining the exit price. On September 28, 2016 we have applied new allocation algorithms to Tiebreaker and modified its rebalancing sequence to be every two weeks (10 trading days). Since March 1st, 2016, all trades are conducted automatically with no modifications based on the guidelines outlined herein. No manual modifications have been made to the gain stop prices. In instances where a position gaps through the trigger price, the initial open gapped trading price is utilized. Transaction costs are calculated as the larger of 6.95 per trade or $0.0035 * number of shares trades.
BlackDog: BlackDog is a paper trading simulation of a tactical asset allocation strategy that utilizes highly liquid ETFs of large cap and fixed income instruments. The portfolio is adjusted approximately once per month based on Lucena’s Optimizer in conjunction with Lucena’s macroeconomic ensemble voting model. Due to BlackDog’s low volatility (half the market in backtesting) we leveraged it 2X. By exposing twice its original cash assets, we take full advantage of its potential returns while maintaining market-relative low volatility and risk. As evidenced by the chart below, BlackDog 2X is substantially ahead of its benchmark (S&P 500).
In the past year, we covered QuantDesk's Forecaster, Back-tester, Optimizer, Hedger and our Event Study. In future briefings, we will keep you up-to-date on how our live portfolios are executing. We will also showcase new technologies and capabilities that we intend to deploy and make available through our premium strategies and QuantDesk® our flagship cloud-based software.
My hope is that those of you who will be following us closely will gain a good understanding of Machine Learning techniques in statistical forecasting and will gain expertise in our suite of offerings and services.
- Forecaster - Pattern recognition price prediction
- Optimizer - Portfolio allocation based on risk profile
- Hedger - Hedge positions to reduce volatility and maximize risk adjusted return
- Event Analyzer - Identify predictable behavior following a meaningful event
- Back Tester - Assess an investment strategy through a historical test drive before risking capital
Your comments and questions are important to us and help to drive the content of this weekly briefing. I encourage you to continue to send us your feedback, your portfolios for analysis, or any questions you wish for us to showcase in future briefings.
Send your emails to: firstname.lastname@example.org and we will do our best to address each email received.
Please remember: This sample portfolio and the content delivered in this newsletter are for educational purposes only and NOT as the basis for one's investment strategy. Beyond discounting market impact and not counting transaction costs, there are additional factors that can impact success. Hence, additional professional due diligence and investors' insights should be considered prior to risking capital.
If you have any questions or comments on the above, feel free to contact me: email@example.com
Have a great week!
Disclaimer Pertaining to Content Delivered & Investment Advice
This information has been prepared by Lucena Research Inc. and is intended for informational purposes only. This information should not be construed as investment, legal and/or tax advice. Additionally, this content is not intended as an offer to sell or a solicitation of any investment product or service.
Please note: Lucena is a technology company and neither manages funds nor functions as an investment advisor. Do not take the opinions expressed explicitly or implicitly in this communication as investment advice. The opinions expressed are of the author and are based on statistical forecasting on historical data analysis.
Past performance does not guarantee future success. In addition, the assumptions and the historical data based on which opinions are made could be faulty. All results and analyses expressed are hypothetical and are NOT guaranteed. All Trading involves substantial risk. Leverage Trading has large potential reward but also large potential risk. Never trade with money you cannot afford to lose. If you are neither a registered nor a certified investment professional this information is not intended for you. Please consult a registered or a certified investment advisor before risking any capital.
The performance results for active portfolios following the screen presented here will differ from the performance contained in this report for a variety of reasons, including differences related to incurring transaction costs and/or investment advisory fees, as well as differences in the time and price that securities were acquired and disposed of, and differences in the weighting of such securities. The performance results for individuals following the strategy could also differ based on differences in treatment of dividends received, including the amount received and whether and when such dividends were reinvested. Historical performance can be revisited to correct errors or anomalies and ensure it most accurately reflects the performance of the strategy.