Erez Katz, CEO and Co-founder of Lucena Research
At Lucena our mission is to democratize some of the best kept secrets in the Financial industry and refute the “black-box” image often associated with Machine Learning. In that spirit, I wanted to share with you an important process by which features are designated as most relevant to a particular asset universe and an investment strategy.
Why You Should Be Using a Genetic Algorithm (GA) for Feature Selection
As we’ve incorporated multiple big-data sources into our platform, our master feature database has ballooned to almost 1,000 indicators. Indicators are data elements that describe a security at a point in time. Examples of indicators can be found in the chart below.
A Factor, also called a Feature, is a quantitative attribute that describes a security at a given point in time.
With such a rich array of data points, we often struggle with deciding which indicators / models are most relevant at any particular time. It is important to note that not all indicators are created equal, nor are they designed to be predictive at all times. An effective Machine Learning algorithm “knows” how to adjust dynamically to environmental or idiosyncratic changes. A Genetic Algorithm (GA) is a technique that can be used to employ a scientific process of feature selection to help distinguish between predictive signals and noise.
What is a Genetic Algorithm In the Context of Big-Data and Machine Learning?
Genetic algorithm (GA) is a problem solving method that mimics the process of natural selection. When utilizing Machine Learning for making investment decisions, factors that are most relevant to your needs can be filtered from a wide list of indicators by replicating the process of natural evolution. The only difference is that rather than dealing with DNA and Chromosomes we are dealing with indicators and multi-factor models.
Survival of the Fittest: How to Form the Best Feature Selection
The goal of the exercise is to identify “nuggets.” A nugget is a multi-factor model composed of multiple indicators and their respective min/max values that together form a filter geared to identify the securities most prone to move predictably in the future.
An illustration of a multi-factor model.
We can easily conduct a fitness function (an event study, for example) to assess how predictive these conditions were historically. An example of a fitness function would be:
Let’s travel back in time (let’s say 1/1/2011 to 12/31/2011) and assess the average price move 20 days after certain stocks met the following condition:
- Gross margins are between 45% and 85%
- PE ratio is between 15 and 25%
- Beta is between 0.75 and 1.5
An event study performance chart. The event date represents the date in which certain securities satisfied the multi-factor (nugget) criteria. The cone represents the standard deviation of the price action of the universe of the matching stocks after the “event” took place.
The bold line is the price prediction based on the mean. A fitness function would normally assess a more defined (biased) mean line combined with a narrower cone (smaller variance as defined by the standard deviation).
Now that we understand what a nugget and a fitness function are, we are ready to describe the GA process.
What does the Genetic Algorithm process do? Two things:
- Identifies which indicators to combine into a nugget.
- Measure the fitness score of the nugget.
Here is the Genetic Algorithm process step-by-step:
Step 1: Generate random population. (Indicators are represented by letters.)
Step 2: Evaluate each nugget based on a fitness function.
Step 3: Sort the nuggets based on their fitness score.
Step 4: The best two nuggets survive to participate in the next evolution.
Step 5: Form the next generation of nuggets by selecting nuggets randomly. This time, however, we favor the indicators that scored higher in the previous evolution’s fitness evaluation.
Step 6: Sort the next generation based on fitness function and the best two nuggets that survived.
Repeat the process above (steps 1 through 6) until you witness that a single nugget consistently remains in first place. You can now identify the “lone survivor” ready for further analysis and refinement before moving into production.
The above process was greatly simplified for illustration, but you can see how vast the opportunities are to apply GA’s when forming investment strategies.
The GA process covers an important step in machine learning research, which is Feature Selection. The process of selecting features most suitable for a strategy is a dynamic classification that knows how to adjust to change in market conditions.
Interested in learning more about our AI driven investment strategies?
Watch CEO Erez Katz discuss: “The Journey of an Alternative Data Signal”
Liked this post? Here are some similar topics:
How To Validate Alternative Data for Stock Forecasting Erez Katz, CEO and Co-founder of Lucena Research There are many different Machine Learning methods that can be utilized for stock forecasting. A few we recently discussed use neural networks with time series...
Erez Katz, CEO and Co-founder of Lucena Research. The rapid growth of big data has resulted in a technology and AI arms race. In the past, being an AI player would typically earn you a new level of professional esteem but big data, data science and machine learning...
Erez Katz, CEO and Co-founder of Lucena Research. At Lucena, our mission is to bridge the gap between validated data and data-driven professionals. Portfolio managers seek reliable ways to efficiently assess and deploy alternative data for investment decision...
Erez Katz, CEO and Co-founder of Lucena Research. Investors often look at Sharpe ratio to determine a portfolio’s strength. (Sharpe ratio measures a portfolio’s risk adjusted return.) The goal of Sharpe ratio is to assess a portfolio’s returns discounted against...
Erez Katz, CEO and Co-founder of Lucena Research At Lucena, we always try to understand the root cause of unexpected results and pull actionable insights from the data. Sometimes it's easy to blame the machine when actually it did exactly what it was...