QuantDesk® Machine Learning Forecast

for the Week of May 7th, 2018

Timeseries Data To Forecast Stock Prices Using Convolutional Neural Networks

Erez Katz writes about Betting On US Defense Stocks

by Erez Katz, CEO and Co-founder of Lucena Research.


Recap

Last week we discussed how configuring a deep net infrastructure to accommodate timeseries data as trends vs. point-in-time could increase the accuracy of predicting stocks’ future performance. We based our theory on what we know has worked well for deep learning in practice. Classification in the context of computer vision such as handwriting recognition, natural language processing and speech recognition are all predicated on some visual representation of input data whether its training data, testing data or production data. The networks are trained via training data (also called labeled data), which is historical input data paired with its corresponding output. In our case, patterns of features that describe the state of a stock and its corresponding return outcome, f(x) = y.

How Does A Deep Neural Net Model Predict An Outcome?

The concept behind deep nets is that the networks are able to classify -- through thousands of thousands of iterations of trial and error -- complex relationships between certain characteristics of an image and its corresponding classification. In many cases, these relationships are so subtle and deep that they exceed the limits of our brains to comprehend. As scientists, all we can do is control the learning process and assess the outcome, often in amazement! This is one of the reasons that I find AI/deep learning so revolutionary and transformational, even at such an early stage.

Take, for example, the seemingly trivial task of recognizing a bird within an input image. The deep learner is able to breakdown the image into a pixelated representation, with many thousands of pixels. It then sequences the image through receptive fields (convolutional transformation). Subsequently, it is then able to distinguish with high accuracy meaningful patterns (relevant to the bird’s image) from background pixelated noise. The following network diagram demonstrates a fairly typical representation of the layers used to accurately classify an object (in our case, a bird) within an image.

Image 1: A typical convolutional layered diagram. Credit: Adit Deshpande CS Undergraduate UCLA(’19)
In principle, the convolutional layers and the max pooling layers are disciplines developed to reduce the dimensionality of the input layer so that it is vectorized most efficiently for error minimization. The fully connected input layers are tasked with minimizing the gaps between the network outcome and the desired outcome. This is done mathematically through a cost function reduction (summing up partial derivatives and discerning the slope of the cost function) to adjust the weights and biases parameters between the neurons of the fully connected layers in order to reduce the cost function to a desired local minima. These processes, gradient descent and backpropagation are beyond the scope of this article, but feel free to click here to learn more.

A typical learning process feeds a large number of images of birds along with images that do not contain birds. In turn, the convolutional neural network (CNN) “learns” to effectively recognize subtle but distinctive bird-like patterns (such as a beak, feathers or wings) and to distinguish a bird pattern from the broader image representation.

Come to think of it, the very same concept can be attributed to stock market predictions. Imagine all the different types of time-series data that together can describe the distinct state of a publicly traded company. In theory, we can feed into a convolutional neural network many samples of the public company’s state paired with its future price outcomes. If the network is constructed properly, and the data is indeed predictive, the network will be able to classify “winning” patterns in the past and subsequently recognize new “winning” patterns in the future.

Taking The Concept Of Timeseries Image Representation One Step Further

Predicting future prices based on historical patterns is not new and has been occupying investors since the inception of the stock market in 1817. However, since information travels so efficiently, recognizing traditional pattern formations is just not good enough for profit. The goal of feeding timeseries data into CNNs is meant to identify complex predictive patterns unrecognizable to the naked eye, enabling investors to scientifically validate an investment algorithm and act on investment opportunities well before it is exploited by the masses.

We talked about taking features (also called factors) over time and assessing them daily as they describe the state of publicly traded companies.

Image 2: Sample features across five publicly traded companies. Each feature contains its respective daily value per company. We are depicting eight features both fundamentals and technicals. Together they describe the state of a company.

Clearly, the values above are not conducive to recognizing complex patterns for a few reasons:

  1. 1) The data elements are not measured homogeneously:
    1. Some are in dollars.
    2. Some are in percent.
    3. Some are fractions.
  2. The data is provided as point-in-time values. No trend information is available in the above example. Even if we take the value of stochastics, moving averages and the like which contain historical trend-relative value, the data is still somewhat stale and doesn’t provide enough details for complex image analysis.
  3. Each feature value for a given stock is self-contained and isolated. In other words, there is no clear peer-relative measurement.

In order to resolve the shortcomings of the raw features data, we have migrated our raw data into a more robust feature set -- a process called “feature engineering.” In turn, we’ve created ranked features. So, we have normalized all the values of the features into a Gaussian distribution representation (normally distributed).

Image 3: Sharpe ratio representation using Z-score. The normal distribution guarantees that each Sharpe value for a specific stock falls within a well-defined range and is measured in relationship to the mean of its peers (i.e. S&P500 or the Russell 1K).

Now that we have ranked each of our raw features, we can assess the historical trends of our ranked features over time. For example, we can tell how Apple’s Sharpe ratio transformed over the last 21 days relative to its peers. Ranked features over time are much more potent than raw features as they do more than provide trend information. The trend is also measured against a company’s peers vs. in isolation.

At this point we are ready for the last step of transforming the ranked time series features into rich images. Rather than processing the underlying trend graph as the image, we’ve identified a novel transformation algorithm that is much more exciting. Gramian Angular Difference Field (GADF) is a time series to image transformation algorithm that measures the frequency and the sum of the angles of the curves described by the time series by which a flat 1-D data is ultimately transformed into a two-dimensional array. For more information, please refer to a white paper written by Zhiguang Wang and Tim Oates from the Department of Computer Science and Electric Engineering at the University of Maryland.

Apple TTM Sharpe Rank (Trailing 12 month)

Apple TTM Sharpe Rank (Trailing 12 month) – Gramian Angular Transformation

Apple TTM Asset Turnover Rank (Trailing 12 month)

Apple TTM Asset Turnover Rank - Gramian Angular Transformation

Image 4: The images above (on the left) are pairs of our ranked features and their corresponding rich graphical representation (on the right). The beauty of such transformations is that images are clearly represented homogeneously with richer colors, shading scales and patterns.

Conclusion

We’ve covered quite a bit this week but hopefully you are now able to experience first-hand what it takes to conduct meaningful, quantitative research. It is also important to note that the algorithm, on its own and no matter how sophisticated and effective, is not sufficient enough to achieve a successful outcome. The nature of the data and its inherent predictive power is what ultimately drives the algorithm’s success. Lucena’s value proposition has always been and will continue to be to enable investment professionals with the best data and tools available to differentiate them from the crowd and increase their odds for success.

More to come on this topic in future newsletters.

Strategies Update

As in the past, we will provide weekly updates on how the model portfolios and the theme-based strategies we cover in this newsletter are performing.

Tiebreaker – Lucena’s Long/Short Equity Strategy - Since Inception 51.13% vs. benchmark of 3.57%
Image 1: Tiebreaker YTD– benchmark is VMNIX (Vanguard Market Neutral Fund Institutional Shares)
Past performance is no guarantee of future returns.

Tiebreaker has been forward traded since 2014 and to date it has enjoyed remarkably low volatility and boasts an impressive return of 51.13%, low volatility as expressed by its max-drawdown of only 6.16%, and a Sharpe of 1.67! (You can see a more detailed view of Tiebreaker’s performance below in this newsletter.)

BlackDog – Lucena’s Risk Parity - Since Inception 43.68% vs. benchmark of 20.57%

We have recently developed a sophisticated multi-sleeve optimization engine set to provide the most suitable asset allocation for a given risk profile, while respecting multi-level allocation restriction rules.

Essentially, we strive to obtain an optimal decision while taking into consideration the trade-offs between two or more conflicting objectives. For example, if you consider a wide universe of constituents, we can find a subset selection and their respective allocations to satisfy the following:

  • Maximizing Sharpe
  • Widely diversified portfolio with certain allocation restrictions across certain asset classes, market sectors and growth/value classifications
  • Restricting volatility
  • Minimizing turnover

We can also determine the proper rebalance frequency and validate the recommended methodology with a comprehensive backtest.

Image 2: BlackDog YTD– benchmark is AQR’s Risk Parity Fund Class B
Past performance is no guarantee of future returns.

Forecasting the Top 10 Positions in the S&P

Lucena’s Forecaster uses a predetermined set of 10 factors that are selected from a large set of over 500. Self-adjusting to the most recent data, we apply a genetic algorithm (GA) process that runs over the weekend to identify the most predictive set of factors based on which our price forecasts are assessed. These factors (together called a “model”) are used to forecast the price and its corresponding confidence score of every stock in the S&P. Our machine-learning algorithm travels back in time over a look-back period (or a training period) and searches for historical states in which the underlying equities were similar to their current state. By assessing how prices moved forward in the past, we anticipate their projected price change and forecast their volatility.

The charts below represent the new model and the top 10 positions assessed by Lucena’s Price Forecaster.

Image 3: Default model for the coming week.

The top 10 forecast chart below delineates the ten positions in the S&P with the highest projected market-relative return combined with their highest confidence score.

Image 4: Forecasting the top 10 position in the S&P 500 for the coming week. The yellow stars (0 stars meaning poorest and 5 stars meaning strongest) represent the confidence score based on the forecasted volatility, while the blue stars represent backtest scoring as to how successful the machine was in forecasting the underlying asset over the lookback period -- in our case, the last 3 months.

To view a brief video of all the major functions of QuantDesk, please click on the following link:
Forecaster
QuantDesk Overview

Analysis

The table below presents the trailing 12-month performance and a YTD comparison between the two model strategies we cover in this newsletter (BlackDog and Tiebreaker), as well as the two ETFs representing the major US indexes (the DOW and the S&P).

12 Month Performance BlackDog and Tiebreaker
Image 5: Last week’s changes, trailing 12 months, and year-to-date gains/losses.
Past performance is no guarantee of future returns.

Appendix

For those of you unfamiliar with BlackDog and Tiebreaker, here is a brief overview: BlackDog and Tiebreaker are two out of an assortment of model strategies that we offer our clients. Our team of quants is constantly on the hunt for innovative investment ideas. Lucena’s model portfolios are a byproduct of some of our best research, packaged into consumable model-portfolios. The performance stats and charts presented here are a reflection of paper traded portfolios on our platform, QuantDesk®. Actual performance of our clients’ portfolios may vary as it is subject to slippage and the manager’s discretionary implementation. We will be happy to facilitate an introduction with one of our clients for those of you interested in reviewing live brokerage accounts that track our model portfolios.

Tiebreaker: Tiebreaker is an actively managed long/short equity strategy. It invests in equities from the S&P 500 and Russell 1000 and is rebalanced bi-weekly using Lucena’s Forecaster, Optimizer and Hedger. Tiebreaker splits its cash evenly between its core and hedge holdings, and its hedge positions consist of long and short equities. Tiebreaker has been able to avoid major market drawdowns while still taking full advantage of subsequent run-ups. Tiebreaker is able to adjust its long/short exposure based on idiosyncratic volatility and risk. Lucena’s Hedge Finder is primarily responsible for driving this long/short exposure tilt.

Tiebreaker Model Portfolio Performance Calculation Methodology Tiebreaker's model portfolio’s performance is a paper trading simulation and it assumes opening account balance of $1,000,000 cash. Tiebreaker started to paper trade on April 28, 2014 as a cash neutral and Bata neutral strategy. However, it was substantially modified to its current dynamic mode on 9/1/2014. Trade execution and return figures assume positions are opened at the 11:00AM EST price quoted by the primary exchange on which the security is traded and unless a stop is triggered, the positions are closed at the 4:00PM EST price quoted by the primary exchange on which the security is traded. In the case of a stop loss, a trailing 5% stop loss is imposed and is measured from the intra-week high (in the case of longs) and low (in the case of shorts). If the stop loss was triggered, an exit from the position 5% below, in the case of longs, and 5% above, in the case of shorts. Tiebreaker assesses the price at which the position is exited with the following modification: prior to March 1st, 2016, at times but not at all times, if, in consultation with a client executing the strategy, it is found that the client received a less favorable price in closing out a position when a stop loss is triggered, the less favorable price is used in determining the exit price. On September 28, 2016 we have applied new allocation algorithms to Tiebreaker and modified its rebalancing sequence to be every two weeks (10 trading days). Since March 1st, 2016, all trades are conducted automatically with no modifications based on the guidelines outlined herein. No manual modifications have been made to the gain stop prices. In instances where a position gaps through the trigger price, the initial open gapped trading price is utilized. Transaction costs are calculated as the larger of 6.95 per trade or $0.0035 * number of shares trades.

BlackDog: BlackDog is a paper trading simulation of a tactical asset allocation strategy that utilizes highly liquid ETFs of large cap and fixed income instruments. The portfolio is adjusted approximately once per month based on Lucena’s Optimizer in conjunction with Lucena’s macroeconomic ensemble voting model. Due to BlackDog’s low volatility (half the market in backtesting) we leveraged it 2X. By exposing twice its original cash assets, we take full advantage of its potential returns while maintaining market-relative low volatility and risk. As evidenced by the chart below, BlackDog 2X is substantially ahead of its benchmark (S&P 500).

In the past year, we covered QuantDesk's Forecaster, Back-tester, Optimizer, Hedger and our Event Study. In future briefings, we will keep you up-to-date on how our live portfolios are executing. We will also showcase new technologies and capabilities that we intend to deploy and make available through our premium strategies and QuantDesk® our flagship cloud-based software.
My hope is that those of you who will be following us closely will gain a good understanding of Machine Learning techniques in statistical forecasting and will gain expertise in our suite of offerings and services.

Specifically:

  • Forecaster - Pattern recognition price prediction
  • Optimizer - Portfolio allocation based on risk profile
  • Hedger - Hedge positions to reduce volatility and maximize risk adjusted return
  • Event Analyzer - Identify predictable behavior following a meaningful event
  • Back Tester - Assess an investment strategy through a historical test drive before risking capital

Your comments and questions are important to us and help to drive the content of this weekly briefing. I encourage you to continue to send us your feedback, your portfolios for analysis, or any questions you wish for us to showcase in future briefings.
Send your emails to: info@lucenaresearch.com and we will do our best to address each email received.

Please remember: This sample portfolio and the content delivered in this newsletter are for educational purposes only and NOT as the basis for one's investment strategy. Beyond discounting market impact and not counting transaction costs, there are additional factors that can impact success. Hence, additional professional due diligence and investors' insights should be considered prior to risking capital.

If you have any questions or comments on the above, feel free to contact me: erez@lucenaresearch.com

Have a great week!

Erez Katz Signature

erez@lucenaresearch.com


Disclaimer Pertaining to Content Delivered & Investment Advice

This information has been prepared by Lucena Research Inc. and is intended for informational purposes only. This information should not be construed as investment, legal and/or tax advice. Additionally, this content is not intended as an offer to sell or a solicitation of any investment product or service.

Please note: Lucena is a technology company and neither manages funds nor functions as an investment advisor. Do not take the opinions expressed explicitly or implicitly in this communication as investment advice. The opinions expressed are of the author and are based on statistical forecasting on historical data analysis.
Past performance does not guarantee future success. In addition, the assumptions and the historical data based on which opinions are made could be faulty. All results and analyses expressed are hypothetical and are NOT guaranteed. All Trading involves substantial risk. Leverage Trading has large potential reward but also large potential risk. Never trade with money you cannot afford to lose. If you are neither a registered nor a certified investment professional this information is not intended for you. Please consult a registered or a certified investment advisor before risking any capital.
The performance results for active portfolios following the screen presented here will differ from the performance contained in this report for a variety of reasons, including differences related to incurring transaction costs and/or investment advisory fees, as well as differences in the time and price that securities were acquired and disposed of, and differences in the weighting of such securities. The performance results for individuals following the strategy could also differ based on differences in treatment of dividends received, including the amount received and whether and when such dividends were reinvested. Historical performance can be revisited to correct errors or anomalies and ensure it most accurately reflects the performance of the strategy.