# Why Deep Reinforcement Learning (DRL) Matters for Trading

Erez Katz, CEO and Co-founder of Lucena Research

**Why Deep Reinforcement Learning Matters for Trading**

**Which scenario is more favorable for a potential trade?**

- Low predicted return with high confidence.
- High predicted return with low confidence.

Traditional regression models such as KNN or linear regression are not able to address this situation. However, policies stemming from behavioral science are in the wheelhouse of Deep Reinforcement Learning (DRL).

**Deep Reinforcement Learning (DRL) **

Deep learning has traditionally been used for image and speech recognition. However, with the growth in alternative data, machine learning technology and accessible computing power are now very desirable for the Financial industry.

To understand DRL, we have to make a distinction between Deep Learning and Reinforcement Learning.

**What is Deep Learning?**

Deep learning (also called deep nets) is a subset of machine learning that utilizes a hierarchical level of artificial neural networks to carry out many connected machine learning tasks. The artificial neural networks are built like the human brain, with neuron nodes connected together like a web.

Each neuron is responsible for solving a narrowly defined problem, whereby together the web of neurons is able to solve complex problems.

**Deep Learning at a High Level**

It’s important to note that like other forms of artificial intelligence, deep learning holds three important attributes:

**Generalization** – Inferring an outcome from a new, yet to be seen state (or situational input) by generalizing the solution. By generalizing the way an outcome is derived from an input (for example, through a formula), we don’t have to have a reference for every single situation. Rather, all we have to do is input a state into a formula and “voila!” the formula returns our outcome.

**Randomness**– Assessing and qualifying outcomes based on random inputs. In other words, quantitatively qualifying how far off an output is from its ground truth, the desired outcome if we had perfect information.**Self-Adjustment**– The ability to mathematically directionally adjust a model in order to get closer and closer to the desired outcome. This is done through a process called error minimization, a learning process which gets on target through adjusting a deep learning model over many random points of reference.

**Why Reinforcement Learning for Trading?**

Reinforcement learning (RL) is derived from century old research in psychology, where learning is the process of mapping situations to actions in order to maximize a certain reward or minimize a certain punishment.

Similar to a mouse learning to find the right path in a maze, the learner is not told which action to take, but instead must discover which actions yield the highest reward through trial and error.

Rather than learning a model to make a prediction, RL learns a policy .

A policy considers an input state in order to recommend the best action (or an output). For example, consider a state of a stock in the context of its technical and fundamental factors.

Reinforcement learning will determine a policy of a buy, hold or sell for stock trading. One additional important characteristic of reinforcement learning is the concept of a reward.

Reinforcement learning is trained by rolling back time and making predictions based on various situational states. It then assesses the outcome in the context of reward (positive or negative daily return, for example).

RL learns a policy by quantifying the outcome as reward.

As you can imagine, there are endless types of situations a single stock can be in and the process of deep reinforcement learning dynamically creates and modifies a policy table (Q-table) that can be consulted with for various situations (states) with the goal of determining an action that will maximize a reward (investment return, in our case).

The Q-table continuously updates as it learns how a policy yields consistent rewards.

The reinforcement learning process is the formation and adjustment of the policies (Q-table) by inspecting an action and assessing its reward.

For example, imagine we have two identical states based on certain technical and fundamental factors for Apple (AAPL). In one case, the decision to buy AAPL generated a reward of $100. In the other case, buying AAPL caused us to lose $200.

An inconsistent outcome would typically lead to splitting a single policy into two more granular policies. The learner can identify an additional factor (like a social media sentiment score) that was vastly different when compared to the two above states.

In essence, what the machine has done is update the policy to be more granular so that the two seemingly identical states can be distinguished the next time it is assessed.

**Why Deep Reinforcement Learning for investment? **

The evolution of deep learning and reinforcement learning is very exciting for us — particularly since reinforcement learning is perfectly aligned with investment objectives. Here are just a few reference points:

– The learner can be trained to optimize the same objective as investors (risk adjusted return, for example).

– The learner tries to distinguish between different states and appropriately consider a policy for: Risk of loss, possible, but unlikely, large return or probable small return.

– Or an inconclusive state by which doing nothing (staying in cash) is smart.

– Deep Reinforcement Learning can provide additional Machine Learning on top of traditional methods to validate data and form investment strategies.

– A predictive signal can be passed into a Reinforcement Learner to determine how to best benefit from such signal.

With advancements in hardware and machine learning disciplines, Deep Reinforcement Learning continues to push boundaries and will ultimately provide cognitive reasoning that can match or exceed that of humans.

Want more information on how we use deep learning to forecast stock prices? Drop a comment below or contact us.