Erez Katz, CEO and Co-founder of Lucena Research


At Lucena our mission is to democratize some of the best kept secrets in the Financial industry and refute the “black-box” image often associated with Machine Learning. In that spirit, I wanted to share with you an important process by which features are designated as most relevant to a particular asset universe and an investment strategy.


Why You Should Be Using a Genetic Algorithm (GA) for Feature Selection


As we’ve incorporated multiple big-data sources into our platform, our master feature database has ballooned to almost 1,000 indicators. Indicators are data elements that describe a security at a point in time. Examples of indicators can be found in the chart below.


QuantDesk feature selection









A Factor, also called a Feature, is a quantitative attribute that describes a security at a given point in time.


With such a rich array of data points, we often struggle with deciding which indicators / models are most relevant at any particular time. It is important to note that not all indicators are created equal, nor are they designed to be predictive at all times. An effective Machine Learning algorithm “knows” how to adjust dynamically to environmental or idiosyncratic changes. A Genetic Algorithm (GA) is a technique  that can be used to employ a scientific process of feature selection to help distinguish between predictive signals and noise.


What is a Genetic Algorithm In the Context of Big-Data and Machine Learning?

Genetic algorithm (GA) is a problem solving method that mimics the process of natural selection. When utilizing Machine Learning for making investment decisions, factors that are most relevant to your needs can be filtered from a wide list of indicators by replicating the process of natural evolution. The only difference is that rather than dealing with DNA and Chromosomes we are dealing with indicators and multi-factor models.

natural selection

Survival of the Fittest: How to Form the Best Feature Selection

The goal of the exercise is to identify “nuggets.”  A nugget is a multi-factor model composed of multiple indicators and their respective min/max values that together form a filter geared to identify the securities most prone to move predictably in the future.



Multi-Factor model for machine learning









An illustration of a multi-factor model.


We can easily conduct a fitness function (an event study, for example) to assess how predictive these conditions were historically. An example of a fitness function would be:

Let’s travel back in time (let’s say 1/1/2011 to 12/31/2011) and assess the average price move 20 days after certain stocks met the following condition:

  • Gross margins are between 45% and 85%
  • PE ratio is between 15 and 25%
  • Beta is between 0.75 and 1.5


Event Study Price breakdown










An event study performance chart. The event date represents the date in which certain securities satisfied the multi-factor (nugget) criteria. The cone represents the standard deviation of the price action of the universe of the matching stocks after the “event” took place.

The bold line is the price prediction based on the mean. A fitness function would normally assess a more defined (biased) mean line combined with a narrower cone (smaller variance as defined by the standard deviation).

Now that we understand what a nugget and a fitness function are, we are ready to describe the GA process.

What does the Genetic Algorithm process do? Two things: 

  1. Identifies which indicators to combine into a nugget.
  2. Measure the fitness score of the nugget.


Here is the Genetic Algorithm process step-by-step:


Step 1: Generate random population. (Indicators are represented by letters.)

build a genetic algorithm

Step 2: Evaluate each nugget based on a fitness function.

How to form feature selection

Step 3: Sort the nuggets based on their fitness score.

feature selection

Step 4: The best two nuggets survive to participate in the next evolution.

How to build a genetic algorithm

Step 5: Form the next generation of nuggets by selecting nuggets randomly. This time, however, we favor the indicators that scored higher in the previous evolution’s fitness evaluation.

build a genetic algorithm

Step 6: Sort the next generation based on fitness function and the best two nuggets that survived.

feature selection using genetic algorithm








Repeat the process above (steps 1 through 6) until you witness that a single nugget consistently remains in first place. You can now identify the “lone survivor” ready for further analysis and refinement before moving into production.

The above process was greatly simplified for illustration, but you can see how vast the opportunities are to apply GA’s when forming investment strategies.

The GA process covers an important step in machine learning research, which is Feature Selection. The process of selecting features most suitable for a strategy is a dynamic classification that knows how to adjust to change in market conditions.



Interested in learning more about our AI driven investment strategies?

Let’s Talk. 

Watch CEO Erez Katz discuss: “The Journey of an Alternative Data Signal”

Watch Here


Liked this post? Here are some similar topics: