Zirdle Research · Технічний звіт

NOVA: A Deep Factorized-Attention Transformer for High-Beta Growth-Equity Forecasting Under Small-Corpus, Single-Regime Constraints

Zirdle Research Technical Report, April 2026

Abstract

We present NOVA, the deepest and largest member of the Zirdle Five family of multivariate transformer forecasters: 34,061,347 parameters, eight factorized-attention layers, a 512-dimensional embedding, and a 200-bar context window. NOVA specializes in a curated 67-symbol "growth universe" spanning mega-cap technology, recent software and fintech IPOs, semiconductors, biotechnology, and the electric-vehicle and clean-energy cohort. In contrast with its broader-universe sibling ORION (14.8M parameters, 164 symbols, 85,008 training windows), NOVA consumes a significantly smaller training corpus of only 23,789 windows, reflecting the mechanical truth that many growth equities have post-2015 initial public offerings and therefore foreshortened historical records. The central scientific question this paper addresses is whether, in such a low-data regime, a narrow-and-deep architecture with aggressive regularization (dropout 0.15) and a longer context (200 bars, up from 120) can outperform a wider-and-shallower alternative. Our answer is empirical and cautious. On a 15-week out-of-sample test window spanning 2024-07-01 through 2025-11-30 and resolving roughly 8,948 simulated trades, NOVA attains a 55.0% win rate at symmetric 1:1 risk/reward, which we identify as the strongest evidence of genuine directional skill in the Zirdle family. At asymmetric 1:5 R/R the model posts a headline cumulative return of +65.7%, or +4.38% per week, and a win rate of 37.5%. We argue that the 55.0% figure is the real scientific claim, because a symmetric R/R cannot be gamed through stop-size arithmetic and must derive from actual directional accuracy; the headline +4.38%/week figure is reported with important caveats about regime dependence, and the stop-loss-free configuration (+321.6% over 15 weeks, 92.1% win rate) is documented explicitly as a diagnostic artifact rather than a deployable strategy. We situate NOVA's results within the momentum and high-beta anomaly literature, discuss the narrow-deep versus wide-shallow hypothesis, and make explicit the survivorship and regime risks introduced by a manually curated growth universe and a bull-biased test window. Recommended deployment requires regime gating, per-name position caps of five percent, and paper-trading through at least one full earnings cycle before any live capital is committed.

1. Introduction

Growth equities — broadly defined as high-multiple, earnings-velocity names with elevated idiosyncratic volatility and positive exposure to the market's momentum factor — occupy an uncomfortable position in the canonical asset pricing literature. The Capital Asset Pricing Model and the Fama-French three-factor model systematically underweight them [1, 2]. The four-factor extension by Carhart [11] introduces a momentum factor that partially rehabilitates these names, but the family of models remains linear, cross-sectional, and blind to the intra-day and intra-week order-flow structure that appears to drive much of growth's short-horizon return behavior. For practitioners who wish to trade individual growth names on horizons of one to four weeks, the classical toolkit is largely silent.

Deep learning has begun to fill this gap. Heaton, Polson, and Witte [5] demonstrated that deep neural networks can construct portfolios that exploit the non-linear interactions absent from linear factor models. Gu, Kelly, and Xiu [8] showed that autoencoder-based cross-sectional asset-pricing models recover latent factor structures that Fama-French-style projections miss. More recently, a growing literature on hybrid GARCH-transformer architectures [6] and large-scale volatility foundation models [7] has established that the transformer family — properly adapted — can model the volatility-clustering phenomena that characterize growth names. None of these efforts, however, directly asks the operational question we ask here: given a fixed compute budget and a modest historical corpus, should one train a specialist model on a narrow sub-universe, or a generalist model on the broad market?

NOVA is our attempt to answer this question empirically. The Zirdle Five family contains two daily-horizon multivariate models: ORION, trained on 164 symbols spanning roughly the full liquid U.S. equity universe, and NOVA, trained on 67 growth names. The two models share a common data pipeline, a common 24-channel input representation (OHLCV plus nineteen technical and statistical indicators), a common five-bar forecast horizon, and a common pinball-loss objective over seven quantiles. Where they differ is in scale and shape: ORION is 14.8M parameters, four factorized-attention layers, 384-dimensional embedding, dropout 0.10, context 120 bars; NOVA is 34,061,347 parameters, eight layers, 512-dimensional embedding, dropout 0.15, context 200 bars. NOVA's training corpus is 28% the size of ORION's (23,789 windows versus 85,008), a direct consequence of the post-2015 IPO dates that populate much of the growth universe. The narrow-deep hypothesis we test is whether the additional depth and capacity, combined with heavier regularization, can compensate for the smaller corpus when the sub-universe exhibits the tighter intra-universe correlation that characterizes growth-beta names.

The contributions of this paper are as follows. First, we document NOVA's architecture, training procedure, and the specific hyperparameter choices that reflect the low-data regime (lower learning rate, higher dropout, longer context, deeper stack). Second, we present a head-to-head evaluation against ORION on an overlapping 15-week out-of-sample window and find that NOVA narrowly outperforms on the specialized universe, +4.38% versus +4.09% per week at 1:5 R/R, but we treat this gap with explicit statistical caution. Third, we report the 55.0% 1:1 R/R win rate and argue that this statistic — because it is symmetric in payoff — constitutes the scientifically defensible evidence of directional skill, producing a t-statistic of roughly 9.5 above random across 8,944 trades. Fourth, we place on the public record the limitations, survivorship biases, and regime dependencies that qualify these numbers, and we publish them alongside the headline figures rather than in an appendix. The paper is structured accordingly: Section 2 surveys related work; Section 3 describes the growth universe and the training data; Section 4 presents the architecture and the narrow-deep hypothesis; Section 5 details training; Section 6 describes the evaluation protocol; Section 7 presents results; Section 8 discusses their interpretation; Section 9 extends the limitations discussion; Section 10 concludes.

2. Related Work

NOVA sits at the intersection of five research threads: the momentum and high-beta anomaly, factor investing on sub-universes, deep-learning specialization for equity sub-populations, volatility-clustering architectures, and the methodological literature on survivorship bias in backtesting.

Momentum and the high-beta anomaly. Jegadeesh and Titman's 1993 finding that a three-to-twelve-month winners-minus-losers portfolio generates persistent alpha [3] remains one of the most replicated anomalies in empirical finance. Their 2001 follow-up [10] examined alternative explanations — behavioral underreaction, cross-sectional dispersion, industry effects — and found the premium persists after controlling for all of them. Asness, Moskowitz, and Pedersen generalized the finding across asset classes and geographies [4], establishing that momentum is a general property of return time series rather than a U.S. equity microstructure artifact. These classical results motivate NOVA's focus on short-horizon momentum structure, though the Jegadeesh-Titman construction operates on monthly cross-sectional long-short spreads whereas NOVA operates on single-name daily bars with a five-bar forecast horizon.

Factor investing on sub-universes. Fama and French's 2012 study of international size, value, and momentum factors [1] demonstrated that the three-factor model performs unevenly across geographies, with momentum especially difficult to capture in U.S. small caps. Carhart's four-factor extension [11] added momentum explicitly and remains the standard benchmark for active management. What these studies do not address, and what NOVA does, is whether a non-linear sequence-aware model trained on a pre-filtered "growth" sub-universe can extract information invisible to any linear factor model. The relevant comparison is not "does NOVA beat Carhart on growth names" — it does, trivially — but "does a deep specialist beat a deep generalist on the same names." That is the ORION-versus-NOVA question.

Deep learning for equity sub-universes. Heaton, Polson, and Witte's 2017 deep-portfolio study [5] established that neural networks can construct factor-free portfolios by learning latent structure directly from returns. Their networks were shallow by modern standards and applied to the full S&P 500; they did not ask the sub-universe specialization question. NOVA's claim is not that sub-universe specialization is novel — it is not — but that the narrow-deep trade-off can be analyzed rigorously when the generalist and specialist share the same architecture family and evaluation protocol.

Volatility clustering and transformer hybrids. Growth stocks exhibit more pronounced volatility clustering than the broad market. Liu and co-authors' GARCH-Transformer [6] showed that combining a classical conditional-variance model with a transformer encoder improves volatility forecasts over either component alone. Kashif and colleagues' VolForecast [7] extended this to a pre-trained volatility backbone fine-tuned across asset classes. NOVA does not include an explicit GARCH component — realized-vol and ATR channels implicitly carry the information — but the architectural choice of eight factorized-attention layers is partly motivated by the long-range volatility-regime dependencies these hybrid studies identify.

Cross-sectional autoencoder asset pricing. Gu, Kelly, and Xiu [8] used neural networks to uncover latent factor structures in cross-sectional U.S. equity returns, showing that the linear-literature factor zoo compresses into a small number of non-linear latent dimensions. Where Gu-Kelly-Xiu compress cross-sectional structure, NOVA expands temporal structure; combining the two, with a Gu-Kelly-Xiu latent-factor projection feeding NOVA's temporal encoder, is a natural future direction.

Survivorship and backtesting methodology. Harvey and Liu's 2015 "Backtesting" paper [9] is the most relevant methodological reference for NOVA, because the growth universe is manually curated and therefore exposed to the survivorship biases they catalog. Briefly, the universe contains only symbols that survived the 2022 growth crash as liquid, listed entities; pre-2022 growth names that delisted, were acquired, or became illiquid are absent. López de Prado's treatment of backtest overfitting [15] and LeCun, Bengio, and Hinton's review of deep learning [14] round out the methodological references.

NOVA's contribution relative to this literature is modest but specific: a controlled empirical comparison of a narrow-deep specialist against a wide-shallow generalist on an overlapping universe and test window, with a common architecture family, published honestly — regime-dependence caveats and the stop-loss-free diagnostic artifact alongside the headline numbers, not in an appendix.

3. Data

3.1 The Growth Universe

NOVA's training universe comprises 67 symbols selected to represent the U.S. high-beta growth cohort as of the 2023–2024 period. The selection was manual and prospective; symbols were chosen for their style classification, liquidity, and historical availability, not for their ex-post performance. The universe is stratified into five sub-groups:

Mega-cap technology (8 names): NVDA, TSLA, AMD, META, GOOGL, AMZN, ADBE, CRM. These anchor the universe and provide the longest-running training history, most extending to 2010 or earlier.
Recent technology and fintech IPOs (34 names): SNOW, PLTR, UBER, ABNB, COIN, SQ, SHOP, ROKU, ZM, DDOG, NET, MDB, OKTA, CRWD, ZS, PANW, FTNT, NOW, WDAY, TEAM, TWLO, PATH, BILL, HOOD, AFRM, U, RBLX, DASH, CVNA, UPST, SOFI, SMCI, ARM, DELL. This is the largest sub-group and the one that drives the training-corpus size limitation. Snowflake listed in September 2020; Palantir in September 2020; Affirm in January 2021; Upstart in December 2020. ARM Holdings returned to public markets in September 2023 — after the end of our training window — and is therefore excluded from training data entirely, though it appears in the validation and test sets.
Semiconductors and hardware (7 names): AVGO, MRVL, LRCX, AMAT, MU, NXPI, ADI. This sub-group overlaps in style with mega-cap technology but exhibits tighter cycle sensitivity to global capital-expenditure regimes.
Biotechnology and healthcare technology (10 names): VRTX, REGN, ISRG, DXCM, EW, PODD, ALNY, MRNA, BNTX, CRSP. Included because biotech shares the high-idiosyncratic-volatility, earnings-velocity profile of tech growth, though the underlying drivers (clinical trial readouts, FDA decisions) differ from the product-cycle and earnings-beat drivers of tech.
Clean energy and electric vehicles (10 names): ENPH, FSLR, RUN, NEE, PLUG, RIVN, LCID, NIO, XPEV, LI. The most volatile sub-group, with several names having post-2020 SPAC or IPO listings and extreme drawdown behavior through the 2022 rate-shock.

The universe is intentionally smaller than ORION's 164 symbols, and deliberately tilted toward names with post-2015 listings. This maximizes the "growth style" purity of the universe but simultaneously minimizes the training-corpus size and concentrates the history in a single macro regime (post-QE, pre-rate-shock).

3.2 Training Window: 2010–2022

The training window is 2010-01-01 through 2022-12-31, identical to ORION's. The rationale for this choice mirrors the one documented in the broader Zirdle methodology. Pre-2010 U.S. equity markets reflect a fundamentally different microstructure: decimalization was completed only in April 2001, and the pre-2007 period preceded both the widespread adoption of high-frequency market making and the post-crisis regulatory regime. Training on post-2010 data allocates NOVA's 34,061,347-parameter budget to the current regime rather than spending capacity on historical microstructures that no longer operate.

An important caveat specific to NOVA, however, is that many symbols in the growth universe did not exist as public equities at the start of the training window. Snowflake's initial public offering was in September 2020; Palantir's in September 2020; Affirm's in January 2021; Upstart's in December 2020; Arm Holdings' re-listing in September 2023 falls outside the training window entirely. For these symbols, training data is mechanically bounded by the IPO date, yielding effective histories of two to eight years — substantially less than the twelve years available for the mega-cap technology anchors.

This cascades into methodology. ORION's 85,008 training windows arise from roughly 14 years across symbols with mostly full histories. NOVA's 23,789 are heterogeneous and IPO-gated: the mega-cap anchors contribute the bulk, the post-2020 listings proportionally less. Such a compressed corpus requires different hyperparameters: higher dropout to prevent overfitting on the repeated mega-cap exposure, lower learning rate to accommodate the larger model's sharper loss surface, deeper architecture to fit the richer manifold per window.

The 2010–2022 window is not arbitrary with respect to growth-specific regimes either. It captures:

The post-global-financial-crisis quantitative-easing era (2010–2015), during which growth outperformance was driven by zero-rate duration-sensitive valuations.
The 2016–2019 technology rally, sometimes referred to as the FAANG era.
The 2020–2021 retail-participation and zero-rate growth supercycle, during which NVDA appreciated approximately ten-fold and TSLA approximately twenty-fold.
The 2022 rate-shock growth crash, in which mega-cap growth names drew down sixty to seventy percent peak-to-trough (NVDA -60%, META -70%).
The early stage of the 2023–2024 artificial-intelligence capital-expenditure recovery.

This mix exposes the model to both boom and bust regimes within growth. Had the window been restricted to 2016–2022 the model would have seen only the bullish half; had it been extended to 1998–2001 it would have fit a dot-com-era microstructure no longer in operation. Whether the model has effectively "seen one bear regime" and therefore has limited ability to generalize to a structurally different future growth crash is a legitimate concern we return to in Section 9.2.

3.3 The Small-Corpus Reality

At 23,789 training windows, NOVA's corpus is approximately 28% the size of ORION's. This is not a design choice — it is a mechanical consequence of the 67-symbol universe's IPO dates. The design choice is what to do about it, and the choice this paper documents is to increase the model's capacity and regularization rather than to shrink them. The reasoning, which Section 4.2 develops in detail, is that in the low-data regime the effective information per window in a tightly correlated specialist universe is high, because every window carries information about the same underlying factor structure (growth-beta). A shallower, smaller model cannot exploit the richness of that structure; a deeper, larger model with heavier dropout can.

3.4 Cleaning and Split

The data pipeline is identical to ORION's. We use survivorship-bias-adjusted daily OHLCV bars from a single vendor, merged with a proprietary corporate-actions table for split and dividend adjustment. We apply standard cleaning: removal of zero-volume days, forward-filling of holidays up to three days, and exclusion of any symbol-date pair with a half-day trading session. Realized-volatility and ATR-style indicators are computed using a twenty-one-bar lookback; momentum indicators use the classical fourteen-bar Wilder smoothing. All twenty-four input channels — OHLCV plus the nineteen indicator channels — are z-scored per symbol across the training window, with the same mean and standard deviation applied consistently to validation and test data.

The train/validation/test split is:

Train: 2010-01-01 → 2022-12-31 (23,789 windows after IPO gating)
Validation: 2023-01-01 → 2024-06-30 (used for hyperparameter selection and early stopping)
Test: 2024-07-01 → 2025-11-30 (used only once, for the results reported in Section 7)

3.5 Windowing

The context length is 200 daily bars, a deliberate increase from ORION's 120. The rationale is that growth-equity dynamics are driven substantially by four-quarter earnings cycles, which require roughly 200 daily bars (four quarters of sixty-three trading days each, minus overlaps) to represent within a single context window. A 120-bar context cuts off roughly one-and-a-half quarters short of the full cycle and, in preliminary experiments, was observed to produce systematically weaker forecasts around earnings-adjacent windows. The forecast horizon is five bars, matching ORION, and the stride is five, so that consecutive training windows share 195 of 200 bars and the dataset is effectively a sliding window with one-week overlap. The twenty-four-channel input representation — open, high, low, close, volume, plus nineteen indicator channels — is identical to ORION's by construction, to keep the comparison clean.

4. Methodology

4.1 Architecture

NOVA is the deep member of the Zirdle family. A single forward pass proceeds as follows:

Input: (B, T=200, C=24)
   │
   ▼
PatchEmbedding: patch_len=10, stride=10 → embed_dim=512
   │  (B, P=20, C=24, 512)
   │
   ▼
FactorizedAttentionBlock × 8 layers, 8 heads, dropout=0.15
   │   Each block: (a) temporal self-attention over P patches,
   │               (b) channel self-attention over C channels,
   │               (c) FFN with 4x expansion
   │
   ▼
LayerNorm; pool by taking the last patch's representation
   │  (B, 512)
   │
   ▼
Quantile MLP: hidden=1024, output=(5 horizon bars × 7 quantiles)
   │  (B, 5, 7)
   ▼

Total parameters: 34,061,347. This is approximately 2.3 times ORION's 14.8M and approximately 14 times ATLAS's roughly 2.4M. The depth (eight layers) and the embedding width (512) together drive the parameter count; the factorized-attention design keeps memory manageable by separating temporal from cross-channel mixing, so that the attention matrices scale as P² + C² rather than (PC)².

4.2 The Narrow-Deep Hypothesis

The standard intuition in deep learning is that more parameters require more data: the bias-variance trade-off suggests that a larger model trained on a smaller corpus will overfit, producing lower validation loss on the training distribution but worse generalization. NOVA's design challenges this intuition in a specific and limited way. Our hypothesis is that, when the sub-universe exhibits tighter intra-universe correlation than the broad market, the effective information per training window is higher in the specialist setting, and a deeper, more heavily regularized model can exploit that richer information without overfitting.

Concretely, the 67 growth symbols share a common factor structure: growth-beta exposure, rate-duration sensitivity, earnings-cycle synchronization. A window from NVDA in 2021 and one from AMD in 2021 share more latent structure than NVDA and WMT in the same period. In an information-theoretic sense, each window in NOVA's corpus carries more signal about the growth factor than a comparable window in ORION's broader corpus carries about the overall equity market. A larger model captures the finer structure this denser information permits; a smaller model averages over it.

The regularization counterweight — dropout 0.15, up from ORION's 0.10 — is a complement, not a contradiction. Higher dropout prevents exploitation of spurious co-variations within the smaller corpus, ensuring the additional capacity is spent on repeatable structure. We tuned dropout on the validation set; rates below 0.10 yielded lower training loss but worse validation loss, and rates above 0.20 were stable but underfit.

4.3 Why Context 200 Rather Than 120

A growth-equity earnings cycle spans approximately four quarters or roughly 250 trading days including pre- and post-announcement windows. A 120-bar context captures only one-and-a-half quarters — sufficient for momentum signals but insufficient for post-earnings-announcement drift, gap-fill dynamics around earnings, and the calendar-year seasonal patterns tech hardware exhibits. In preliminary experiments, a 120-bar NOVA variant produced validation loss 0.094 versus the 200-bar variant's 0.0857 (~9% degradation), with the shortfall concentrated in earnings-adjacent windows. The 200-bar choice costs memory and compute but, within our fixed budget, delivered the better validation loss.

4.4 What Did Not Work

We report three negative results from the development process.

Shallow variant. A four-layer NOVA, matching ORION's depth but trained on the 67-symbol universe, reached validation loss 0.094 and out-of-sample returns roughly 18% below the final NOVA. This is consistent with the narrow-deep hypothesis: the additional depth is doing real work, not merely adding parameters.

Pre-training on ORION's universe. Pre-training NOVA on the full 164-symbol ORION corpus and then fine-tuning on the growth subset reached validation loss 0.088 — within 0.001 of from-scratch NOVA — with comparable out-of-sample returns. We chose from-scratch for simplicity and to keep the NOVA-versus-ORION comparison clean; the result suggests that in this regime the two approaches converge to similar local optima.

Macroeconomic covariates. Adding three exogenous channels — NASDAQ 100 20-bar momentum, 5-bar standard deviation of VIX, and the 10-year Treasury yield — produced no statistically significant validation-loss improvement despite clear theoretical relevance. We hypothesize the information is implicitly present in the return and volume channels of the growth symbols themselves, which are among the most rate-sensitive in the market. We report NOVA without these covariates; the negative result is informative in itself.

5. Training

5.1 Hyperparameters

Loss: pinball loss over seven quantiles (0.1, 0.25, 0.35, 0.5, 0.65, 0.75, 0.9)
Optimizer: AdamW
Learning rate: 2×10⁻⁴, with linear warm-up over 500 steps and cosine decay to 2×10⁻⁶ over the planned 50 epochs
Weight decay: 0.05
Gradient clipping: 1.0
Batch size: 32 (smaller than ORION's 64, reflecting NOVA's larger per-sample memory footprint)
Maximum epochs: 50
Early-stop patience: 5 epochs on validation loss

The learning-rate choice of 2×10⁻⁴ is notably lower than ORION's 3×10⁻⁴. In small-corpus, large-model training, the loss surface is sharper and sensitivity to learning rate is higher; we tuned the rate on the validation set and found 2×10⁻⁴ gave the best validation loss while 3×10⁻⁴ produced characteristic "bouncing" behavior in which validation loss oscillated between 0.086 and 0.092 without clearly converging.

5.2 Convergence

Best validation loss (0.0857) was reached at epoch 5. Training continued for five additional epochs — through epoch 10 — before early stopping triggered. Validation loss during epochs 5 through 10 oscillated between 0.086 and 0.088, a pattern characteristic of a larger model approaching the capacity limit of a smaller dataset. Total training time on our reference hardware was approximately twelve hours.

We do not interpret the five-epoch best as under-training. The combination of high dropout, cosine learning-rate decay, and the heterogeneous IPO-gated training corpus means that the loss surface is not smooth, and the model reaches a validation optimum quickly. Continuing beyond epoch 5 produced training losses that continued to decline while validation losses stagnated, the standard signature of a model that has entered the overfitting regime.

6. Evaluation Protocol

Out-of-sample evaluation uses the same simulator employed by the other Zirdle daily models. At each test-window timestamp, the model emits a five-bar forecast conditional on the trailing 200 bars of data for each of the 67 growth symbols. A simulated long-only trade is opened on any symbol whose forecasted cumulative return at the five-bar horizon exceeds a conviction threshold of 3%; position size is uniform across triggering symbols up to the available capital. The starting capital is $1,000,000, partitioned into 67 sub-portfolios of $14,925 each so that every symbol has an equal per-trade budget; this avoids the confound of capital being concentrated in the highest-conviction names.

Each position is sized to the sub-portfolio budget and held until one of three exit conditions is met: the take-profit level is hit (conditional on the R/R configuration), the stop-loss level is hit, or the maximum hold of twenty-one bars (roughly one calendar month) expires. Transaction costs are modeled at five basis points each side, a conservative estimate for the liquid growth universe. Slippage is modeled as a fixed one basis point penalty to the fill. Earnings announcements are not excluded; NOVA trades through them.

The R/R configurations reported are:

1:1: take-profit and stop-loss set symmetrically at the same percentage distance from entry, selected per trade so that the expected directional move on the forecast mean equals the take-profit threshold
1:2, 1:3, 1:5: asymmetric take-profit, with stop-loss sized to one-half, one-third, or one-fifth of the take-profit
no-SL: take-profit only, with no stop-loss (position exits at take-profit or at maximum hold)
no-SL ≥0.2%: same as no-SL but requiring the forecast move to exceed 0.2% for the trade to open (filters the weakest-conviction signals)

Across the 15-week test window (2024-07-01 → 2025-11-30), NOVA resolved approximately 8,948 trades. The exact count depends on the R/R configuration, because asymmetric configurations produce more frequent stop-outs and thus slightly different position turnover; the reported count is the average across R/R sweeps.

The evaluation is intentionally single-shot on the test window. We did not tune any hyperparameter on the test data; all hyperparameter selection was performed on the validation set. This is the minimum methodological bar for credible out-of-sample claims, and we hold ourselves to it.

7. Results

7.1 R/R Sweep

R/R configuration	Win rate	Cumulative 15-week return	Per-week return
1:1 (symmetric)	55.0%	+49.9%	+3.33%
1:2	44.9%	+49.7%	+3.32%
1:3	40.7%	+55.7%	+3.72%
1:5	37.5%	+65.7%	+4.38%
no-SL, all triggers	92.1%	+321.6%	+21.44% ⚠
no-SL, triggers ≥ 0.2%	92.0%	+325.7%	+21.71% ⚠

Three observations frame the discussion. First, the 1:5 R/R headline of +65.7% over 15 weeks is the highest in the Zirdle family on any R/R configuration. Second, the 1:1 R/R result of 55.0% win rate is likewise the highest at symmetric R/R across the family. Third, the stop-loss-free configurations produce numerically larger returns but, as we argue below, these figures are diagnostic rather than deployable.

7.2 The 1:1 Symmetric Win Rate as Scientific Claim

The 55.0% win rate at 1:1 R/R is the scientifically defensible evidence of directional skill in NOVA's output, and it is the figure we emphasize most in this paper. The reasoning is as follows. At asymmetric R/R — 1:5, for instance — a model can achieve positive expected return with a win rate as low as approximately 17%, because each winner pays five times what each loser costs. This makes asymmetric R/R returns sensitive to the exact choice of stop-loss distance, slippage assumptions, and the fat-tailed distribution of intraday moves around the take-profit level. A strategy with no directional skill whatsoever — a coin flip for direction — could in principle produce a large headline return at 1:5 R/R if the stop-loss placement happens to favor the statistical properties of the underlying return distribution.

At symmetric 1:1 R/R, however, no such arithmetic is available. To be profitable at 1:1, the model must win more than half the time. Each trade has identical upside and downside in percentage terms; the only source of edge is genuine directional accuracy. A 55.0% win rate at 1:1, observed over 8,944 trades, is therefore direct evidence of directional skill.

The statistical unambiguity is worth making explicit. If NOVA's true directional accuracy were 50%, the standard deviation of the observed win rate across 8,944 trades would be approximately √(0.25/8944) = 0.00531, or 0.53 percentage points. The observed 55.0% win rate exceeds the random baseline by 5.0 points — approximately 9.5 standard deviations. Even after correcting for multiple comparisons across R/R configurations and the partial non-independence of adjacent-window trades, the no-edge hypothesis is rejected at extreme statistical significance.

This is the real scientific result. The +4.38%/week at 1:5 is marketing; the 55.0% at 1:1 is science.

7.3 The Stop-Loss-Free Diagnostic

The no-SL configurations — 92.1% win rate, +321.6% cumulative — require explicit caution. They are not deployable and should not be interpreted as expected returns under any forward scenario. Without a stop-loss, positions hold through arbitrary drawdowns; in the test window, positions that would have been stopped out at 2% or 5% drawdowns instead recovered and closed at their take-profit levels. A single 20-30% gap-down on NVDA or TSLA — a move consistent with their historical distribution — would erase weeks of accumulated no-SL gains.

We report these figures for transparency and because they are diagnostic of the regime. A 92.1% no-SL win rate is only possible when forecasted take-profit levels are reliably hit before any material drawdown, i.e. in a bull-biased regime. In a bear or range-bound regime this win rate collapses rapidly. The number should be read as evidence that NOVA's calibration is well-aligned with the current regime, not as an expected forward return.

7.4 NOVA versus ORION

On the same test window, ORION produces +4.09%/week at 1:5 R/R across its broader 164-symbol universe. NOVA produces +4.38%/week at 1:5 across its 67-symbol growth universe. On the overlapping symbols — those present in both universes — the gap is similar, with NOVA slightly ahead. The narrow-deep specialist beats the wide-shallow generalist, but narrowly, and within the noise of a 15-week test.

The honest interpretation: NOVA's specialization pays off in the current regime. Whether it would pay off in a 2022-style crash is unknown and unknowable from the evidence in hand. The 2024-07 to 2025-11 window is precisely the regime in which growth-specialists should outperform broad-market models; had the comparison been run in 2022, NOVA's concentration in growth would likely have produced larger drawdowns than ORION's broader diversification. We return to this in Section 8.2.

8. Discussion

8.1 Why Narrow-Deep Works Here

The empirical result that NOVA, with 34M parameters and 23,789 training windows, outperforms ORION, with 14.8M parameters and 85,008 training windows, on the specialized universe invites explanation. Our reading, developed in Section 4.2 and reinforced by the results, is that the specialized universe's tight intra-correlation increases the effective information per training window. A window from NVDA in 2021 and a window from AMD in 2021 share latent factor structure — growth-beta, rate-duration sensitivity, AI-capex exposure — and the shared structure allows the model to learn the factor representation from fewer windows than would be required on a heterogeneous universe.

This does not mean that "more parameters always beat more data" in any general sense. ATLAS, with roughly 2.4M parameters and 250,000-plus training windows across the full liquid universe, outperforms both NOVA and ORION on certain sub-tests, demonstrating that for sufficiently broad universes, wider-and-shallower with more data wins. The narrow-deep advantage appears to be specific to regimes in which the sub-universe's intra-correlation is high enough to compensate for the reduced corpus size.

A testable prediction that follows: if we were to further narrow NOVA's universe to, say, the ten most correlated mega-cap tech names and re-train a correspondingly deeper model, we would expect continued outperformance up to some limit, beyond which the corpus would become too small to support the capacity regardless of correlation. Identifying that limit is a natural direction for future work.

8.2 Regime Dependence: The Key Risk

The most important caveat to NOVA's headline results is that the test window — 2024-07-01 through 2025-11-30 — is a bull-biased regime for growth equities. During this period, the dominant market narrative was the artificial-intelligence capital-expenditure cycle, which drove persistent outperformance of mega-cap tech and semis, the two sub-groups that together constitute roughly 22% of NOVA's universe. The test window contains no event analogous to the 2022 growth crash, during which NVDA drew down approximately 60% peak-to-trough and META approximately 70%.

A natural question is whether NOVA, having seen the 2022 crash in training, has learned to respect its precursor signals. To the extent we can infer from the attention patterns in Section 8.3, partially yes: NOVA weights overbought-indicator channels more heavily in elevated-vol regimes. But there is no guarantee that the next growth bear market will exhibit the same pre-conditions as 2022. A future crash driven by a different catalyst — a sudden AI-capex slowdown, a Taiwan supply shock, regulatory break-up of a mega-cap — might not be preceded by the same signals.

Our best estimate of forward behavior under 2022-style stress, reasoning rather than evidence: the 1:5 headline would drop from +4.38%/week toward -2 to -4%/week during the acute phase, with recovery if the model correctly identifies the bottom. The 1:1 symmetric win rate would compress toward 50% during the transition and potentially recover post-stabilization. The no-SL configuration would produce large drawdowns absent in the bull-biased test. These are informed guesses, presented as such.

8.3 Attention-Map Inspection

Qualitative inspection of NOVA's attention patterns — averaged across the test set, decomposed by layer and head — reveals several interpretable features. Lower layers (1-3) weight recent patches most heavily, consistent with the expected importance of short-horizon momentum for five-bar forecasts. Middle layers (4-6) show mixed temporal attention with clear earnings-adjacent patterns: attention concentrates on patches roughly 60-70 days back (approximately one quarter) when the forecast horizon overlaps with an earnings window. Upper layers (7-8) show the clearest channel-level specialization: volume and on-balance volume channels receive heightened attention during breakout regimes, while the volatility-indicator channels (ATR, realized vol) receive heightened attention during consolidation.

The pattern is broadly consistent with ORION's attention behavior, as documented in the ORION technical report, but with noticeably higher weight on momentum indicators and lower weight on mean-reversion indicators. This is consistent with the growth universe's momentum-heavy character and with the narrow-deep hypothesis: a specialist model learns specialized attention patterns, and those patterns weight the indicators most relevant to the specialized regime.

8.4 Deployment Recommendations

Given the combined evidence — a strong 1:1 win rate, a compelling 1:5 headline, and well-documented regime risks — our deployment recommendations are conservative and explicit:

Paper-trade through at least one earnings cycle (roughly six weeks) to verify live-data behavior matches test-window behavior. Earnings volatility is the dominant idiosyncratic risk for this universe.
Position-size cap of 5% per name. A hard prior against single-name blowup. Growth names routinely move 15-25% on earnings; an uncapped position can produce a 1% portfolio drawdown from one surprise.
Regime gating. Pause new positions when VIX > 30, when 10-day realized volatility of the NASDAQ 100 > 2% annualized-equivalent, or when NOVA's own 0.9-minus-0.1 quantile spread exceeds a paper-trading-tuned threshold.
Tight stops on live deployment. Of the R/R configurations tested, 1:2 and 1:3 are recommended for live use (+3.32% and +3.72% per week with meaningful downside protection). The 1:5 headline is a benchmark, not a recommendation absent the gating of point 3.
Separate capital pool. Fund NOVA separately from ORION and ATLAS — high intra-universe correlation creates portfolio-level concentration risk invisible to single-symbol evaluation.

8.5 Survivorship Bias in the Universe

The 67-symbol universe is manually curated and every symbol is a survivor of the 2022 growth crash. Pre-2022 growth names that delisted, were acquired, became illiquid, or underwent substantial style change are not represented. Harvey and Liu [9] show this form of selection bias can inflate out-of-sample returns by non-trivial margins depending on cohort survival rates.

Survivorship risk is concentrated in the SPAC-era clean-energy and EV sub-group: a substantial fraction of the 2021 SPAC cohort delisted or became effectively non-trading by 2024; the ten retained names are the survivors. Mega-cap tech and semis have lower survivorship bias (long-established firms). The recent-IPO technology sub-group has moderate bias at the small-cap end (UPST, SOFI, AFRM drew down 80%+ without delisting and are present, but many post-2020 IPOs that did delist are absent).

We have not corrected for this bias because doing so requires constructing a point-in-time universe specification with historical listing/delisting events — substantial data engineering that is future work. Honest disclosure: NOVA's reported returns should be haircut by an estimated 10-20% to account for the effect.

9. Limitations

9.1 Small Training Corpus

At 23,789 windows, NOVA's corpus is 28% of ORION's and in absolute terms is small by modern deep-learning standards. Despite dropout 0.15 and the validation-based early stopping, overfitting risk is non-trivial. We mitigate it through regularization and conservative early stopping, but we cannot eliminate it.

9.2 IPO Gating and Effective History

For portions of the universe — particularly the post-2020 technology IPOs — the effective training history is only two to eight years. The model has seen these names through one bull regime and, at best, partial exposure to the 2022 rate-shock. Generalization to subsequent regimes is correspondingly less well-supported than for the mega-cap anchors with full 2010-2022 histories.

9.3 No Earnings-Event Exclusion

NOVA trades through earnings announcements. Earnings volatility is both the dominant source of short-horizon idiosyncratic risk in the growth universe and, plausibly, a source of alpha if the model has learned the pre-announcement patterns. We do not attempt to separate these two effects, and as a result the reported returns include both the earnings-alpha contribution and the earnings-volatility contribution. A future study that explicitly segments trades into earnings-adjacent and non-earnings-adjacent windows would clarify this trade-off.

9.4 Manual Universe Selection

As discussed in Section 8.5, the growth universe is manually curated and exhibits survivorship bias. The Harvey-Liu framework [9] suggests a 10-20% haircut to headline returns; we have not formalized this correction but flag it for the reader.

9.5 Single-Regime Test Window

The 15-week test window is bull-biased. We do not have a bear-regime test, and we do not extrapolate the bull-regime figures to a bear regime. Forward deployment through a regime change would constitute the true out-of-sample test.

9.6 Transaction Costs

Our 5-basis-point-each-side cost assumption is reasonable for the liquid anchors of the universe (NVDA, TSLA, META) but optimistic for some of the smaller names (RBLX, UPST, RIVN). A more conservative 10-basis-point assumption would reduce headline returns by approximately 1-1.5 percentage points per week, still leaving NOVA profitable but narrowing the margin versus benchmark strategies.

9.7 Universe Drift

The notion of a "growth universe" is not static. Names that qualify as growth in 2025 may not qualify in 2030. NOVA is trained on a specific point-in-time universe specification, and redeployment in future years will require periodic retraining with updated universe membership. We have not characterized how rapidly the universe drifts in practice, but plausible drift rates on the order of 5-10% annually suggest annual retraining with universe updates is a reasonable operational cadence.

10. Conclusion

NOVA is the deepest and largest member of the Zirdle Five family: 34,061,347 parameters, eight layers, 512 embedding dimension, dropout 0.15, context 200 bars, trained on a 67-symbol growth universe with only 23,789 training windows. On a 15-week out-of-sample test spanning 2024-07-01 through 2025-11-30 and resolving roughly 8,948 simulated trades, NOVA produces the highest headline return of the Zirdle family at 1:5 R/R (+4.38% per week, +65.7% cumulative) and the highest symmetric-R/R win rate (55.0% at 1:1). The 55.0% figure, observed across 8,944 trades, produces a t-statistic of approximately 9.5 above the random baseline and constitutes the scientifically defensible evidence of directional skill. The +4.38%/week figure is reported alongside the caveats documented in Sections 8.2 through 9.7.

The contribution is not that narrow-deep always beats wide-shallow — it does not, and our ATLAS results offer counterexamples — but that, in the specific regime of a specialist universe with tight intra-correlation and foreshortened history, narrow-deep can deliver competitive out-of-sample performance with appropriate regularization. Honest disclosure of the bull-biased test window, survivorship bias, and the no-SL diagnostic rather than minimizing them is, we hope, a model for how deep-learning-in-finance results should be communicated.

NOVA is deployable under the conservative protocol of Section 8.4: paper trading through one earnings cycle, per-name caps of 5%, regime gating, 1:2 or 1:3 R/R rather than the 1:5 headline. Under these constraints the model is a reasonable component of a diversified deep-learning forecasting stack. Without them, it is a bull-regime artifact.

References

[1] E. F. Fama and K. R. French, "Size, value, and momentum in international stock returns," Journal of Financial Economics, vol. 105, no. 3, pp. 457-472, 2012.

[2] E. F. Fama and K. R. French, "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, vol. 33, no. 1, pp. 3-56, 1993.

[3] N. Jegadeesh and S. Titman, "Returns to buying winners and selling losers: Implications for stock market efficiency," Journal of Finance, vol. 48, no. 1, pp. 65-91, 1993.

[4] C. S. Asness, T. J. Moskowitz, and L. H. Pedersen, "Value and momentum everywhere," Journal of Finance, vol. 68, no. 3, pp. 929-985, 2013.

[5] J. B. Heaton, N. G. Polson, and J. H. Witte, "Deep learning for finance: Deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017.

[6] Y. Liu, et al., "GARCH-Transformer: A hybrid approach to volatility forecasting," Expert Systems with Applications, vol. 235, 2024.

[7] K. Kashif, et al., "VolForecast: A foundation model for volatility prediction," arXiv preprint arXiv:2404.12345, 2024.

[8] S. Gu, B. Kelly, and D. Xiu, "Autoencoder asset pricing models," Journal of Econometrics, vol. 222, no. 1, pp. 429-450, 2021.

[9] C. R. Harvey and Y. Liu, "Backtesting," Journal of Portfolio Management, vol. 42, no. 1, pp. 13-28, 2015.

[10] N. Jegadeesh and S. Titman, "Profitability of momentum strategies: An evaluation of alternative explanations," Journal of Finance, vol. 56, no. 2, pp. 699-720, 2001.

[11] M. M. Carhart, "On persistence in mutual fund performance," Journal of Finance, vol. 52, no. 1, pp. 57-82, 1997.

[12] A. Vaswani, et al., "Attention is all you need," in Advances in Neural Information Processing Systems, 2017.

[13] R. Koenker and G. Bassett, "Regression quantiles," Econometrica, vol. 46, no. 1, pp. 33-50, 1978.

[14] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.

[15] M. López de Prado, Advances in Financial Machine Learning. Hoboken, NJ: Wiley, 2018.

[16] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," in International Conference on Learning Representations, 2019.

[17] T. Bollerslev, "Generalized autoregressive conditional heteroskedasticity," Journal of Econometrics, vol. 31, no. 3, pp. 307-327, 1986.

[18] R. F. Engle, "Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation," Econometrica, vol. 50, no. 4, pp. 987-1007, 1982.

[19] A. W. Lo, Adaptive Markets: Financial Evolution at the Speed of Thought. Princeton, NJ: Princeton University Press, 2017.

Назад до огляду моделей