Erez Katz, CEO and Co-founder of Lucena Research.
Through our research at Lucena, we know it’s important to configure deep net infrastructure to accommodate time series data as a trend formation vs. a single point-in-time. There is a vast difference in how we as humans make decisions versus machine learning.
Some decisions rely on a static state (image classification, for example). When we feed an image to a network, it mainly relies on its final state. In other words, for non-time-series data how the image was formed over time is not relevant to identifying whether the image contains a cat or a dog.
Static vs. Time Series
In contrast, forecasting a stock price based on a data pattern is normally predicated on some form of historical context. In order to underscore that static vs. time formation context, let’s take for example, the autocomplete prediction on our smartphones. Autocomplete is a function of memorizing the previous sequence of letters in a word or the previous sequence of words in a sentence. A sentence that starts with “The sky is____.” would predict with high confidence the next word to be “blue”. But if we only gave the deep network the last word “is,” it would have no relevant information based on which to discern what comes next.
The very same concept applies to stocks’ data. A traditional artificial neural network may learn to forecast the price of a stock based on several factors:
- Daily Volume
- Price to Earnings Ratio (PE)
- Analyst Recommendations Consensus
While the future price of the stock may heavily depend on these factors, their static values at a point-in-time only tells part of the story. A much richer approach to forecasting a stock price would be to determine how the trend of the above factors formed over time.
In other words, let the network try a bunch of timeframes and determine which one provides the highest statistical significance of future trends. Hyper-parameters such as which lookback (training period) and for how long we want to forecast into the future can be tuned during a cross validation period.
However, the more parameters you add to the grid search, the more susceptible you are to overfitting. Not to mention a trend’s time frame is not necessarily a constant. In some cases, a trend of 21 days is more predictive while in other cases 63 days maybe more suitable.
Time Series Data and RNNs to Forecast Stock Prices
RNN (Recurrent Neural Network) is a deep neural network designed specifically to tackle this kind of problem. It is able to determine on the fly what type of historical information should be considered or discarded for a high probability classification.
What is a Recurrent Neural Network (RNN)?
A Recurrent Neural Network (RNN) is a deep learning algorithm that operates on sequences (like sequences of characters). At every step, it takes a snapshot before it tries to determine what’s next. In other words, it operates on trend representations via matrixes of historical states.
RNNs have some form of internal memory, so it remembers what it saw previously. In contrast to the fully connected neural nets and convolutional neural nets (which are feed-forward through which a layer of neurons is used as input adjacent and subsequent layer in the hierarchy). RNNs can use the output of a neuron as an input to the very same neuron.
A diagram representing a recurrent neuron X(t) is an input neuron, A is the class that determines what information to preserve and what to discard and h(t) is the output neuron which feeds back into the network. Credit: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
The above diagram can be greatly simplified by unfolding (unrolling) the recurrent instances as follows:
An unrolled recurrent neuron A.
Not much different than a normal feed-forward network, with one exception: The RNN is able to determine dynamically how deep the network ought to be. In the context of stocks’ historical feature values, consider each vertical formation of x-0 to h-0 a historical snapshot of a state (PE ratio today, PE ratio yesterday, etc…).
Taking The Concept Of RNN One Step Further LSTM (Long/Short Term Memory)
Long Short-Term Memory networks “LSTMs” are a special kind of RNN, capable of learning long-term dependencies. All RNNs have some form of repeating network structure. In a standard RNN the repeating infrastructure is rather straightforward with a single activation function into an output layer of neurons. In contrast LSTMs contain a more robust infrastructure designed specifically to determine which information ought to be preserved or discarded. Common to LSTM infrastructures is a Cell State layer also called a “conveyor belt”.
A cell state/conveyor belt of the RNN infrastructure, tasked with determining which information should be preserved or discarded.
Instead of having a single neural network layer as in a typical RNN, LSTM holds multiple components, tasked with discriminately adding or removing information to be passed through the “conveyor belt”.
A typical LSTM Cell holding four components tasked with determining which information is discarded, added and outputted to the cell state layer (conveyor belt).
I will not get too deep into the inner structure of an LSTM cell, but it’s important to note that the optionality of letting information flow through is managed by three gates using a non-linear activation such as Sigmoid or tanh functions:
- Forget Gate
- Input Gate
- Output Gate
Under the hood RNNs and LSTMs are not much different than a typical multi-layered neural net. The activation functions force the cell’s outcome to conform to a nonlinear representation, Sigmoid to a value between 0 and 1 and tanh to a value between -1 and 1. This is done mainly to enable the typical deep net’s error-minimization discovery through back propagation and gradient descent.
Key Takeaways About Time Series Data and RNNs:
LSTM cells effectively learn to memorize long-term dependencies and perform well. To the untrained eye, the results may seem somewhat incredible or even magical.
One drawback of RNNs and in particular LSTMs is how taxing they are from a computational resources demand perspective. RNNs can be difficult to train and require deep neural network expertise but are a perfect match for time series data as as they can “learn” how to take advantage of sequential signals vs. one time snapshots.
At Lucena we have spent significant efforts on a robust GPUs infrastructure and are extending our AI libraries with new offerings powered by the very same technology.
Want to learn more about the future of stock forecasting?
Liked this post? Here are some similar topics:
Erez Katz, CEO and Co-founder of Lucena Research. The rapid growth of big data has resulted in a technology and AI arms race. In the past, being an AI player would typically earn you a new level of professional esteem but big data, data science and machine learning...
Erez Katz, CEO and Co-founder of Lucena Research. At Lucena, our mission is to bridge the gap between validated data and data-driven professionals. Portfolio managers seek reliable ways to efficiently assess and deploy alternative data for investment decision...
Erez Katz, CEO and Co-founder of Lucena Research. Investors often look at Sharpe ratio to determine a portfolio’s strength. (Sharpe ratio measures a portfolio’s risk adjusted return.) The goal of Sharpe ratio is to assess a portfolio’s returns discounted against...
An Introduction to Applying Deep Reinforcement Learning to Trading Deep Reinforcement Learning (DRL) is a combination of two important methods: Deep Learning and Reinforcement Learning that when integrated appropriately provide a powerful approach to learning...
Erez Katz, CEO and Co-founder of Lucena Research At Lucena we look to turn knowledge into actionable insights. There has been a lot of buzz surrounding machine learning for Finance to forecast securities. We wanted to host a webinar to show a unique approach we are...