QuantDesk® Machine Learning Forecast

for the Week of September 25th, 2017

Grid Searching and How to Minimize Overfitting with Cross-Validation

Erez Katz writes about the performence of the Utilities Live Portfolio

by Erez Katz, CEO and Co-founder of Lucena Research.


With the evolution of A.I and predictive analytics, we feel compelled to share with you some of the research processes that others view as proprietary. Today, I want to discuss hyperparameter machine learning research, specifically in the context of grid searching. I think it’s important to discuss grid searching since many misuse it to create compelling backtests that are unsustainable due to overfitting.

What is Hyperparameter/Grid Search? – (click here for the Wikipedia definition)

Hyperparameters are parameters whose values are set prior to the commencement of a learning process. In contrast, in many other cases machine learning discovers parameters and their values based on the data.

Example – Employing Grid Search to Determine A Stop Loss Condition For A Strategy

Imagine that we want to determine the optimal stop loss levels for a given trading strategy. Stop loss is an exit criteria that is geared to minimize losses when a stock or a portfolio moves against its forecast (in our case the forecast is derived from a machine learning process). Applying a stop loss too early and causing stocks to exit prematurely could adversely impact a portfolio’s performance, in contrast a relaxed stop loss condition would allow the portfolio to accumulate losses if stocks continue to slide.

Given a period in which we conduct our research (in-sample period) here are two approaches among many:

  1. Run the strategy in simulation mode (backtests) and identify the following:
    1. Which stocks are the winners (exhibited returns above a certain threshold – above 3% for example)?
    2. Which stocks are the losers?
    Identify the average max drawdown of the winners vs. that of the losers. If the losers, for example, exhibited on the average a max drawdown of 5% while the winners exhibited an average of 3% max drawdown, it would be plausible to place a stop loss at 3% since we know that empirically winning stocks, on the average, did not fall below 3%.
  2. Conduct a grid search – Preselect a series of stop loss conditions from one to twenty incremented by 1% (1%, 2%, 3%... all the way to 20%). Then run 20 backtests on the same in-sample data set to empirically identify which stop loss produced the best results.

Fitness Function

The above example is an oversimplification of a real-world research because stop loss is only one factor in a complex set of rules for an algorithmic strategy. Also, the rules of a strategy would normally by dynamic. For example, depending on market conditions a stop loss level, could be set at 4% for one period and 6% for another. Finally, the criteria for best results doesn’t have to necessarily be the backtest with the highest total return.

Here is why: Imagine that our grid search identified one stock that if purchased in the beginning of the backtest and held throughout the entire testing period, it would produce the best return. Many would find the results unsubstantiated (or not likely to repeat) due to a lack of statistical significance.

A fitness function would normally contain a set of rules that together score the strength of the grid search results. Below are some parameters of a fitness function, but there could be a lot more:

  1. best total return
  2. best Sharpe
  3. most diversified
  4. longest average hold time
  5. most liquid stocks
  6. etc.

In addition, not all factors in a fitness function are weighted equally. Perhaps best Sharpe is more important than most liquid and therefore, the fitness score would normally take into account different grades of significance to score a strategy’s outcome. In reality, a grid search can conduct many thousands of backtests to exhaustively test all permutations of all the rules and their possible values. At the conclusion of this exhaustive search, the backtests are ranked based on their fitness score.

Data Mining Fallacy - The Danger of Misusing a Grid Search

With a granular grid search it is easy for novice researchers to find a model that fits a predetermined fitness function. With an exhaustive grid search, a researcher is bound to find something that will look outstanding in a backtest and will likely instill a false sense of conviction in a poorly designed strategy. This is a typical example of overfitting.

Overfitting definition in Wikipedia: “In statistics and machine learning, one of the most common tasks is to fit a "model" to a set of training data, with the goal of making reliable predictions on unseen test data….
Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfitted has poor predictive performance, as it overreacts to minor fluctuations in the training data…”

How to Minimize Overfitting

There are many techniques used to minimize overfitting, but among the most important are cross-validation and hold out periods. Cross-validation is an iterative process by which the strategy rules are defined through in-sample training and subsequently validated on a different set of data. The ultimate test is to determine if the rules defined during the cross-validation period still hold during a completely new and unseen set of data. Hold-out periods are timeframes that have not been seen or evaluated even during the cross-validation timeframe. This is the timeframe in which we test if the rules of training and subsequent execution as defined in the cross-validation still hold.

Image 1: Example of how cross-validation and hold-out can be used through iterative research.


Big data and predictive analytics is becoming increasingly popular among investment professionals. It is critically important to understand the mechanics of how the underlying research is conducted as it will enable the recipient of such research to ask the right questions and validate whether the researcher’s recommendations and outcomes are defensible.

Overfitting is one of the most common phenomena that should be minimized or avoided. A proper cross-validation and hold-out policies should be enacted to minimize false optimism from a poorly crafted backtest. Ultimately, the strength of a strategy can only be truly measured once funds are deployed live, since even the most well intended research could inadvertently insert bias.

Image 2: An example of cross-validation reports and analysis from QuantDesk®
Past performance is not indicative of future returns.

Strategies Update

As in past weeks, I want to briefly update you on how the model portfolios and the theme-based strategies we covered recently are performing.

Tiebreaker – Lucena’s Long/Short Equity Strategy - Tiebreaker – Lucena’s Long/Short Equity Strategy YTD return of 11.08% vs. benchmark of -5.51%
Image 1: Tiebreaker YTD– benchmark is VMNIX (Vanguard Market Neutral Fund Institutional Shares)
Past performance is no guarantee of future returns.

Tiebreaker has been forward traded since 2014 and to date it has enjoyed remarkably low volatility and boasts an impressive return of 47.75%, low volatility as expressed by its max-drawdown of only 6.16%, and a Sharpe of 1.88! (You can see a more detailed view of Tiebreaker’s performance below in this newsletter.)

BlackDog – Lucena’s Risk Parity - YTD return of 12.84 % vs. benchmark of 10.71%

We have recently developed a sophisticated multi-sleeve optimization engine set to provide the most suitable asset allocation for a given risk profile, while respecting multi-level allocation restriction rules.

Essentially, we strive to obtain an optimal decision while taking into consideration the trade-offs between two or more conflicting objectives. For example, if you consider a wide universe of constituents, we can find a subset selection and their respective allocations to satisfy the following:

  • Maximizing Sharpe
  • Widely diversified portfolio with certain allocation restrictions across certain asset classes, market sectors and growth/value classifications
  • Restricting volatility
  • Minimizing turnover

We can also determine the proper rebalance frequency and validate the recommended methodology with a comprehensive backtest.

Image 2: BlackDog YTD– benchmark is AQR’s Risk Parity Fund Class B
Past performance is no guarantee of future returns.

Utilities - Large-Cap Based Actively Managed - YTD return of 36.82% vs. 15.38% of the benchmark!!!

I wrote about utilities last year in an attempt to demonstrate how Lucena’s technology can be deployed to identify fixed income alternatives. Since November 2016 we have been tracking our utilities portfolio, and it has been performing exceptionally well in both total return and low volatility -- well ahead of the S&P and its benchmark, the XLU.

Image 3: Utilities based strategy– captured since November of 2016. Benchmark is XLU – Utilities select sector SPDR
Past performance is no guarantee of future returns.

Industrials - Large-Cap Based Actively Managed - YTD Return of 17.00% vs. benchmark of 11.12%

I wrote about an industrial-centric portfolio in January this year. This portfolio was designed to anticipate the administration’s strong desire to invest in infrastructure. The portfolio identifies a well-diversified industrial stock set to track and outperform the XLI (its benchmark).

Image 4: Industrials-based strategy– captured since January 27, 2017 (covered during that week’s newsletter). Benchmark is XLI – Industrials select sector SPDR ETF.
Past performance is no guarantee of future returns.

Forecasting the Top 10 Positions in the S&P

Lucena’s Forecaster uses a predetermined set of 10 factors that are selected from a large set of over 500. Self-adjusting to the most recent data, we apply a genetic algorithm (GA) process that runs over the weekend to identify the most predictive set of factors based on which our price forecasts are assessed. These factors (together called a “model”) are used to forecast the price and its corresponding confidence score of every stock in the S&P. Our machine-learning algorithm travels back in time over a look-back period (or a training period) and searches for historical states in which the underlying equities were similar to their current state. By assessing how prices moved forward in the past, we anticipate their projected price change and forecast their volatility.

The charts below represent the new model and the top 10 positions assessed by Lucena’s Price Forecaster.

Image 5: Default model for the coming week.

The top 10 forecast chart below delineates the ten positions in the S&P with the highest projected market-relative return combined with their highest confidence score.

Image 6: Forecasting the top 10 position in the S&P 500 for the coming week. The yellow stars (0 stars meaning poorest and 5 stars meaning strongest) represent the confidence score based on the forecasted volatility, while the blue stars represent backtest scoring as to how successful the machine was in forecasting the underlying asset over the lookback period -- in our case, the last 3 months.

To view a brief video of all the major functions of QuantDesk, please click on the following link:
QuantDesk Overview


The table below presents the trailing 12-month performance and a YTD comparison between the two model strategies we cover in this newsletter (BlackDog and Tiebreaker), as well as the two ETFs representing the major US indexes (the DOW and the S&P).

12 Month Performance BlackDog and Tiebreaker
Image 8: Last week’s changes, trailing 12 months, and year-to-date gains/losses.
Past performance is no guarantee of future returns.

Model Tiebreaker, Lucena's Active Long/Short US Equities Strategy:

Active Long/Short US Equities Strategy
Tiebreaker: Paper trading model portfolio performance compared to Vanguard Market Neutral Fund since 9/1/2014. Past performance is no guarantee of future returns.

Model BlackDog 2X: Lucena's Tactical Asset Allocation Strategy:

model portfolio performance compared to the SPY and Vanguard Balanced Index Fund
BlackDog: Paper trading model portfolio performance compared to the SPY and Vanguard Balanced Index Fund since 4/1/2014.
Past performance is no guarantee of future returns.


For those of you unfamiliar with BlackDog and Tiebreaker, here is a brief overview: BlackDog and Tiebreaker are two out of an assortment of model strategies that we offer our clients. Our team of quants is constantly on the hunt for innovative investment ideas. Lucena’s model portfolios are a byproduct of some of our best research, packaged into consumable model-portfolios. The performance stats and charts presented here are a reflection of paper traded portfolios on our platform, QuantDesk®. Actual performance of our clients’ portfolios may vary as it is subject to slippage and the manager’s discretionary implementation. We will be happy to facilitate an introduction with one of our clients for those of you interested in reviewing live brokerage accounts that track our model portfolios.

Tiebreaker: Tiebreaker is an actively managed long/short equity strategy. It invests in equities from the S&P 500 and Russell 1000 and is rebalanced bi-weekly using Lucena’s Forecaster, Optimizer and Hedger. Tiebreaker splits its cash evenly between its core and hedge holdings, and its hedge positions consist of long and short equities. Tiebreaker has been able to avoid major market drawdowns while still taking full advantage of subsequent run-ups. Tiebreaker is able to adjust its long/short exposure based on idiosyncratic volatility and risk. Lucena’s Hedge Finder is primarily responsible for driving this long/short exposure tilt.

Tiebreaker Model Portfolio Performance Calculation Methodology Tiebreaker's model portfolio’s performance is a paper trading simulation and it assumes opening account balance of $1,000,000 cash. Tiebreaker started to paper trade on April 28, 2014 as a cash neutral and Bata neutral strategy. However, it was substantially modified to its current dynamic mode on 9/1/2014. Trade execution and return figures assume positions are opened at the 11:00AM EST price quoted by the primary exchange on which the security is traded and unless a stop is triggered, the positions are closed at the 4:00PM EST price quoted by the primary exchange on which the security is traded. In the case of a stop loss, a trailing 5% stop loss is imposed and is measured from the intra-week high (in the case of longs) and low (in the case of shorts). If the stop loss was triggered, an exit from the position 5% below, in the case of longs, and 5% above, in the case of shorts. Tiebreaker assesses the price at which the position is exited with the following modification: prior to March 1st, 2016, at times but not at all times, if, in consultation with a client executing the strategy, it is found that the client received a less favorable price in closing out a position when a stop loss is triggered, the less favorable price is used in determining the exit price. On September 28, 2016 we have applied new allocation algorithms to Tiebreaker and modified its rebalancing sequence to be every two weeks (10 trading days). Since March 1st, 2016, all trades are conducted automatically with no modifications based on the guidelines outlined herein. No manual modifications have been made to the gain stop prices. In instances where a position gaps through the trigger price, the initial open gapped trading price is utilized. Transaction costs are calculated as the larger of 6.95 per trade or $0.0035 * number of shares trades.

BlackDog: BlackDog is a paper trading simulation of a tactical asset allocation strategy that utilizes highly liquid ETFs of large cap and fixed income instruments. The portfolio is adjusted approximately once per month based on Lucena’s Optimizer in conjunction with Lucena’s macroeconomic ensemble voting model. Due to BlackDog’s low volatility (half the market in backtesting) we leveraged it 2X. By exposing twice its original cash assets, we take full advantage of its potential returns while maintaining market-relative low volatility and risk. As evidenced by the chart below, BlackDog 2X is substantially ahead of its benchmark (S&P 500).

In the past year, we covered QuantDesk's Forecaster, Back-tester, Optimizer, Hedger and our Event Study. In future briefings, we will keep you up-to-date on how our live portfolios are executing. We will also showcase new technologies and capabilities that we intend to deploy and make available through our premium strategies and QuantDesk® our flagship cloud-based software.
My hope is that those of you who will be following us closely will gain a good understanding of Machine Learning techniques in statistical forecasting and will gain expertise in our suite of offerings and services.


  • Forecaster - Pattern recognition price prediction
  • Optimizer - Portfolio allocation based on risk profile
  • Hedger - Hedge positions to reduce volatility and maximize risk adjusted return
  • Event Analyzer - Identify predictable behavior following a meaningful event
  • Back Tester - Assess an investment strategy through a historical test drive before risking capital

Your comments and questions are important to us and help to drive the content of this weekly briefing. I encourage you to continue to send us your feedback, your portfolios for analysis, or any questions you wish for us to showcase in future briefings.
Send your emails to: info@lucenaresearch.com and we will do our best to address each email received.

Please remember: This sample portfolio and the content delivered in this newsletter are for educational purposes only and NOT as the basis for one's investment strategy. Beyond discounting market impact and not counting transaction costs, there are additional factors that can impact success. Hence, additional professional due diligence and investors' insights should be considered prior to risking capital.

If you have any questions or comments on the above, feel free to contact me: erez@lucenaresearch.com

Have a great week!

Erez Katz Signature


Disclaimer Pertaining to Content Delivered & Investment Advice

This information has been prepared by Lucena Research Inc. and is intended for informational purposes only. This information should not be construed as investment, legal and/or tax advice. Additionally, this content is not intended as an offer to sell or a solicitation of any investment product or service.

Please note: Lucena is a technology company and neither manages funds nor functions as an investment advisor. Do not take the opinions expressed explicitly or implicitly in this communication as investment advice. The opinions expressed are of the author and are based on statistical forecasting on historical data analysis.
Past performance does not guarantee future success. In addition, the assumptions and the historical data based on which opinions are made could be faulty. All results and analyses expressed are hypothetical and are NOT guaranteed. All Trading involves substantial risk. Leverage Trading has large potential reward but also large potential risk. Never trade with money you cannot afford to lose. If you are neither a registered nor a certified investment professional this information is not intended for you. Please consult a registered or a certified investment advisor before risking any capital.
The performance results for active portfolios following the screen presented here will differ from the performance contained in this report for a variety of reasons, including differences related to incurring transaction costs and/or investment advisory fees, as well as differences in the time and price that securities were acquired and disposed of, and differences in the weighting of such securities. The performance results for individuals following the strategy could also differ based on differences in treatment of dividends received, including the amount received and whether and when such dividends were reinvested. Historical performance can be revisited to correct errors or anomalies and ensure it most accurately reflects the performance of the strategy.