Zirdle Research · Relatório Técnico

ECHO: A Factorized-Attention Transformer for 15-Minute Mean Reversion in Broad-Market ETFs

Zirdle Research Technical Report, April 2026

Abstract

Mean reversion is one of the oldest and most studied regularities in quantitative finance. The canonical approaches — z-score against a rolling mean, Bollinger band crossings, RSI oversold/overbought signals, pair-trading of cointegrated baskets — are well understood and, at the liquid large-cap level, substantially arbitraged. We ask whether a multivariate transformer, supplied only with recent price and volatility features and deliberately blinded to long-term directional indicators, can extract residual short-horizon reversion signal that simple statistical estimators miss on a curated universe of exchange-traded funds. We introduce ECHO, a 3,329,463-parameter factorized-attention transformer operating on 15-minute bars with a context of 160 bars and a 4-bar forecast horizon. ECHO is trained on 164 ETF symbols spanning broad-index, sector, duration, commodity, volatility, and currency categories over 2021-06-01 to 2024-06-30, validated on 2024-07-01 to 2025-03-31, and held out for 53 weeks of walk-forward evaluation from 2025-04-01 to 2026-04-17. The model attains a validation pinball loss of 0.0839 — the lowest of any checkpoint in the Zirdle ensemble — with best epoch 7 and early stopping at epoch 12. Using a triple-barrier evaluation protocol over the held-out year, the longs-only 1:5 configuration realizes +10.44% total return (+0.197% per week) with a 23.6% win rate across 11,874 signals. Most notably, the unstopped (no-stop-loss) longs-only configuration is profitable at +9.36% total (+0.177% per week, 88.7% hit rate) — a signature that is inconsistent with a model that has learned directional momentum and consistent with one that has learned genuine reversion. We characterize ECHO as a capital-preservation specialist: modest absolute return but with the lowest realized drawdown of any model in the ensemble and the strongest risk-adjusted profile. We discuss where the edge appears to originate, why the training window matters for reversion regimes in particular, and the conditions under which the model would be expected to fail.

1. Introduction

Mean reversion, interpreted loosely as the tendency of prices to return toward a local mean after transitory deviation, is a phenomenon with strong theoretical and empirical support across asset classes, horizons, and decades. It underlies the classical pair-trading literature [1], statistical arbitrage on cointegrated baskets [2], short-horizon reversal strategies documented since at least the 1980s [3, 8], and the broader contrarian factor family. In equity markets specifically, mean reversion appears most clearly in three regimes: (i) idiosyncratic deviations from a factor-implied fair value, (ii) microstructure-driven bounce following liquidity shocks, and (iii) cross-sectional dispersion of returns that is subsequently dampened by rebalancing flows.

At the intraday horizon that concerns us here — 15-minute bars projected one hour ahead — the most direct route to reversion signal is not single-name equity. Single names at short horizons are dominated by idiosyncratic news, order-flow imbalance, and private information; they are noisy precisely where reversion is subtle. Exchange-traded funds (ETFs) offer a cleaner substrate. An ETF aggregates dozens or hundreds of underlying names, averaging away much of the idiosyncratic noise and leaving a residual process that is substantially smoother, more stationary, and more amenable to reversion modeling. Rebalancing flows, creation-redemption dynamics, and cap-weighted drift further induce the kind of mean-reverting, regime-persistent behavior that a model can hope to learn.

The challenge is that this is a well-trodden field. Any portfolio manager with a Bloomberg terminal can compute a 20-bar z-score or a Bollinger %B and enter a reversion trade on that signal. These indicators are widely deployed, at scale, on precisely the instruments we are interested in. The relevant scientific question is therefore not "does mean reversion exist?" — it manifestly does — but "is there reversion signal in 15-minute ETF bars that a modern sequence model can extract, conditional on simple statistical indicators already being available to other market participants?" Put differently, we are looking for residual signal beneath what is already arbitraged.

This paper documents our attempt to answer that question. We train a 3.3-million-parameter factorized-attention transformer, ECHO, on a curated universe of 164 ETFs over a three-year window and evaluate it on a held-out year. Two methodological choices deserve foregrounding. First, we deliberately exclude long-term trend indicators (SMA200, extended moving averages, MACD) from the model's input feature set; our intent is to prevent the model from hedging its losses by learning a secondary momentum behavior, and to force specialization into the short-horizon reversion regime. Second, we calibrate the training window to the post-2021 microstructure specifically: zero-commission retail flows, dramatically expanded ETF share of trading volume, and elevated option-gamma dynamics represent a coherent environment that, we argue, is poorly approximated by pre-COVID data.

The remainder of the paper is organized as follows. Section 2 situates our work in the mean-reversion literature. Section 3 describes the data. Section 4 details the model architecture and our feature-selection choices. Section 5 describes training. Section 6 specifies the evaluation protocol, with particular attention to the longs-only ETF-in-a-bull-market question. Section 7 presents results, including the diagnostic no-stop-loss sign-flip. Section 8 discusses the likely sources of edge and the regimes in which ECHO would be expected to fail. Section 9 notes limitations; Section 10 concludes.

2. Related Work

Classical pair trading. Gatev, Goetzmann, and Rouwenhorst [1] formalized the modern empirical treatment of pair trading: identify pairs of securities whose price ratio is historically stationary, trigger on a z-score threshold of the ratio, exit on reversion to mean or stop-loss. Their paper reported roughly 11% annualized excess returns on daily CRSP data from 1962 to 2002, with the interpretation that pairs encode common factor exposures whose idiosyncratic deviations predict subsequent reversion. The strategy's performance degraded in later decades as the trade was discovered and capital chased it; it remains pedagogically central, but on its own is not competitive in contemporary markets.

Statistical arbitrage on cointegrated baskets. Avellaneda and Lee [2] generalized the pair-trading idea to cointegrated baskets built from factor-neutral residuals. Their approach extracts a stationary residual process from a cross-sectional panel by factor decomposition and then models this residual as an Ornstein-Uhlenbeck process, trading on its deviation from equilibrium. Published results over 1997 to 2007 showed meaningful out-of-sample return, though the authors explicitly documented the strategy's drawdowns in high-volatility, deleveraging regimes such as August 2007.

Short-horizon reversal. De Bondt and Thaler [3] documented long-horizon reversal over three- to five-year windows and attributed it to overreaction. Jegadeesh and Titman [9] later disentangled medium-horizon (3 to 12 month) momentum from short-horizon reversal. Lehmann [8] and Conrad and Kaul [10] documented weekly and daily reversal in individual stocks. Khandani and Lo [4] analyzed the August 2007 quant crash in light of short-horizon reversal strategies, documenting how a rapid forced-unwinding of these positions by one large player created a cascade of losses across nominally independent books. The lesson — that reversion strategies are crowded and that their profitability is contingent on liquid exit paths — directly informs our evaluation.

Machine learning for pair and reversion trading. Sarmento and Horta [5] applied supervised machine learning (unsupervised clustering plus per-cluster supervised classifiers) to the pair-selection problem, improving on linear cointegration tests in both trade count and per-trade profitability. Han, Park, and Lee [6] formulated pair trading as a deep reinforcement learning problem, learning a policy that jointly determines entry, exit, and position sizing. Their paper reports Sharpe improvements over threshold-rule baselines; the drawback, as with most DRL applications in finance, is poor sample efficiency and substantial sensitivity to reward specification.

Cross-sectional transformer statistical arbitrage. Choudhry [7] applied transformer architectures to cross-sectional statistical arbitrage, treating a panel of stocks at each time step as a sequence over the stock dimension and using attention to aggregate across names before predicting residual returns. This is architecturally orthogonal to our single-name temporal approach but makes a similar wager: that the flexibility of attention can surface signals beneath what linear factor models identify. Our work differs in that we operate on the time axis rather than the cross-sectional axis, and we target a shorter horizon (1 hour vs. 1 to 5 days).

Why classical indicators get arbitraged. The common thread is that mean reversion is where the money is, but also where the competition is. Simple statistical indicators — z-score, Bollinger %B, RSI — are available to essentially every market participant. On liquid large-cap ETFs, these signals are acted on by high-frequency market-making and statistical-arbitrage desks whose response function is faster and whose inventory management is more sophisticated than any research prototype's. Our conjecture is that residual signal survives at the intraday-but-not-microstructure horizon (15 minutes to 1 hour), in the form of non-linear interactions between volatility, volume, and price level that a transformer can capture but that a single indicator cannot. ECHO is a test of that conjecture.

3. Data

3.1 Why ETFs are good reversion instruments

ETFs exhibit three structural features that make them particularly attractive for short-horizon reversion modeling. First is aggregation of idiosyncratic noise: a cap-weighted index ETF such as SPY reflects the averaged contributions of hundreds of constituents, most of whose idiosyncratic components cancel out at any instant. What remains is predominantly factor-driven, and factor dynamics are smoother and more persistent than the noisy single-name process. Second is rebalancing flows: index ETFs are tracked by funds with periodic rebalancing schedules, and creation/redemption baskets impose quasi-mechanical demand on constituents that varies systematically across the day and week. Flows of this kind generate predictable intraday patterns, particularly in the opening 30 minutes and closing 30 minutes, though our 15-minute resolution deliberately smooths out these very-high-frequency effects. Third is cap-weighted drift: in a cap-weighted ETF, the weights shift as underlying prices move; re-weighting this portfolio to the target weights creates small deterministic corrections that interact with price moves in the underlying, producing a subtle reversion-favoring dynamic. None of these effects is large, and none is independently exploitable at the 15-minute horizon without care; in combination, they plausibly generate the residual signal we wish to surface.

3.2 Training window

The choice of training window for ECHO balances two competing concerns. Mean-reversion regimes are more persistent than momentum regimes — ETF reversion dynamics (rebalancing flows, cap-weighted drift, idiosyncratic aggregation) are rooted in structural factors that change slowly. This argues for as long a training window as we can afford. Against this, microstructure has shifted: commission-free trading post-2019, growth of ETF share (ETF assets under management roughly tripled between 2015 and 2023), and elevated retail participation from 2020 onward have altered the short-horizon volatility structure of liquid ETFs.

We selected 2021-06-01 to 2024-06-30 (three years of training data) as a compromise: long enough to cover multiple mini-regimes (2021 stimulus liquidity, the 2022 rate-shock, 2023 recovery, early 2024 AI-momentum), but not long enough to include the pre-COVID regime in which ETF flow patterns differed meaningfully. A pilot experiment trained on 2018-01 through 2024-06 (5.5 years including pre-COVID) achieved validation pinball of 0.087 — marginally worse than our 2021-onwards version's 0.0839 despite roughly twice the data. This is consistent with ETF reversion being regime-specific: the mix of central-bank liquidity, option-gamma dynamics, and flow-of-funds post-2021 forms a coherent environment that pre-COVID data dilutes.

We hypothesize that the best ECHO-class model would use an even shorter, more dynamic training window — for example, a 12-month rolling window retrained monthly — to track the shifting microstructure more tightly. Our current ECHO is a fixed checkpoint; productionizing a rolling variant is future work and is discussed in Section 9.

3.3 Universe

The 164-symbol universe was assembled to maximize reversion signal density by concentrating on instruments whose price process is structurally reversion-friendly. The categories are:

Broad-index ETFs (9): SPY, QQQ, DIA, IWM, VTI, VOO, VT, EFA, EEM. These are the most liquid ETFs and the purest expression of cap-weighted drift and rebalancing flows.
Sector ETFs (11): XLK, XLF, XLV, XLE, XLI, XLY, XLP, XLU, XLB, XLRE, XLC. Sector funds exhibit both the aggregation effect of ETFs and the rotation dynamics among sectors that can produce reliable relative reversion.
Duration (5): TLT, IEF, SHY, LQD, HYG. Treasury and credit ETFs are driven by macroeconomic news and rate expectations, with intraday moves that often mean-revert after the initial reaction is absorbed.
Commodities (4): GLD, SLV, USO, UNG. Commodity-linked ETFs show distinct reversion patterns tied to futures-market microstructure.
Volatility (3): VIXY, UVXY, SVXY. Volatility ETFs are perhaps the most reversionary instruments in the entire US ETF complex, owing to the term structure of VIX futures and the daily rebalancing mechanics of leveraged/inverse variants.
Currency (3): UUP, FXE, FXY. FX ETFs inherit the mean-reversion statistics of currency pairs.
Gold miners (2): GDX, GDXJ. Miner ETFs blend commodity and equity reversion channels.

The remaining instruments in the 164-symbol universe fill out adjacent categories (regional ETFs, thematic funds, levered variants) that were included to broaden training exposure without diluting the reversion focus. Notably, the universe deliberately excludes single-name stocks, high-yield exotic products, and inverse-momentum products that do not exhibit the clean reversion statistics we wish to exploit.

3.4 Cleaning

We ingested 15-minute bars from the TimescaleDB stock_ohlcv_data table and applied a standard validation pipeline: reject rows with inconsistent OHLC relations (e.g. high below low), near-zero prices, duplicate timestamps, and any bar whose volume is negative or non-integral. The aggregate drop rate was 0.011% of rows, and all 164 symbols retained a sufficient bar count to be included in training. No symbols were dropped for sparsity; the universe was chosen for liquidity in the first place.

3.5 Split and walk-forward integrity

The split is strictly chronological:

Train: 2021-06-01 to 2024-06-30
Validation: 2024-07-01 to 2025-03-31
Test: 2025-04-01 to 2026-04-17 (approximately 53 weeks)

Because windowing spans multiple bars, we enforce a buffer at each split boundary equal to the context length plus the horizon, so that no training window contains any bar from the validation period and no validation window leaks into test. Model selection uses validation pinball loss; the test set is touched only once, at the end, for the results in Section 7. We do not re-tune hyperparameters on test performance.

3.6 Windowing

Each training example consists of a context window of 160 bars — approximately 40 trading hours or five trading days — and a horizon of 4 bars (1 hour). The target is the log return from the last bar of the context to the last bar of the horizon; intervening bars within the horizon are not targets but influence any barrier-touch evaluation downstream. The stride between consecutive training windows is 20 bars, yielding dense overlap. Across 164 symbols and three years of training data, this produces approximately 4.6 million training windows.

4. Methodology

4.1 Architecture

ECHO is a factorized-attention transformer. The 12-channel input of shape (B, T=160, C=12) is first partitioned along the time axis into patches of length 16, so that the sequence length seen by attention is 10 patches per channel. Each patch is linearly embedded to dimension 224. Four factorized-attention blocks then alternate between attention across the time axis (per channel) and attention across the channel axis (per time step), each with 7 heads. A quantile head produces seven conditional quantiles (0.10, 0.25, 0.40, 0.50, 0.60, 0.75, 0.90) of the 1-hour-ahead log return. The total parameter count is 3,329,463.

Factorized attention was chosen for the standard reason — joint time-channel attention scales as O((T · C)^2) per layer, which at T=10 patches and C=12 channels is still tractable but offers no efficiency advantage over the factorized form — and for a more substantive one: factorization biases the model toward representations in which temporal dynamics and cross-channel interactions can be learned at different abstraction levels, which in our experience generalizes better on the relatively small ETF universe than the joint form.

4.2 Channel selection

The 12 channels are:

Open
High
Low
Close
Volume
rsi14 — 14-bar relative strength index, an overbought/oversold oscillator.
bb_pctb — Bollinger %B, position within the 20-period, 2-sigma Bollinger bands.
bb_width — normalized Bollinger band width, a rolling volatility proxy.
atr14 — 14-bar average true range.
adx14 — 14-bar average directional index, a trend-strength measure. Low ADX is reversion-favorable.
obv — on-balance volume, a cumulative volume-flow indicator.
willr14 — 14-bar Williams %R, an alternate oversold detector.

Crucially, this feature set deliberately excludes long-term trend indicators — SMA200, extended exponential moving averages, MACD, and similar. The reasoning is as follows. A transformer provided with a long-term trend channel will, during training, learn to use it: if the long-term trend is up and the short-term move is down, the model will likely learn to predict continuation (pullback in an uptrend) rather than reversion. This is arguably the right signal to learn in isolation, but it muddles the mean-reversion specialization we want for ECHO. In the Zirdle ensemble, a different model is responsible for directional, trend-following signals; ECHO's role is to be a clean reverter. Denying it the information required to exploit trend is the cleanest way to enforce that specialization.

4.3 What we tried that didn't work

An earlier version of ECHO — ECHO-v0 — included MACD and a 50-bar simple moving average in the channel set, for a total of 14 channels. This version achieved a validation pinball loss of 0.093, noticeably worse than the 0.0839 of the final 12-channel version. Inspection of ECHO-v0's predictions revealed that in strongly trending regimes — for example, the QQQ uptrend in Q4 2023 — the model predicted directional continuation rather than reversion, effectively cross-contaminating itself with a (weak) momentum strategy. Removing MACD and SMA50 not only improved validation loss but produced cleaner reversion behavior on inspection: predicted moves against prevailing short-term drift became more common, and the empirical win-rate distribution shifted toward the low-WR, high-R/R regime characteristic of reversion strategies. The lesson is that in a specialized-ensemble setting, feature ablation is as important as feature engineering; the model should be given only what it needs to perform its role, not everything that might be marginally informative.

4.4 Loss function

Training minimizes the average pinball loss over the seven target quantiles. For quantile q and residual r = y - y_hat_q, the pinball loss is max(q·r, (q-1)·r). Averaging across quantiles yields a consistent scoring rule for the conditional distribution and permits the model to express asymmetric tails — useful for reversion, where the distribution of short-horizon returns typically has heavier left and right tails than a Gaussian fit would imply.

5. Training

5.1 Hyperparameters

Training uses AdamW with a learning rate of 2e-4 — roughly half of what we use for the momentum-oriented HELIOS model, reflecting our view that reversion signal is subtle and that larger optimization steps risk destroying it. Weight decay is 0.05. Batch size is 96 windows; dropout is 0.1; gradient clipping is applied at a global norm of 1.0. We warm up linearly for 500 steps and then cosine-decay to 1e-5 over the full training horizon of 30 epochs, with early stopping on validation pinball and a patience of 5 epochs. Mixed-precision (bf16) training is used throughout. All training was on a single NVIDIA A100-80GB instance; full training to early stop required approximately 6 wall-clock hours.

5.2 Convergence

ECHO reached its best validation pinball loss of 0.0839 at epoch 7, then plateaued. Epochs 8 through 12 produced no improvement, triggering early stopping at epoch 12 after five consecutive non-improving epochs. We restored the epoch-7 checkpoint as the final model. Training loss continued to decrease monotonically after epoch 7, indicating that the model's capacity is not exhausted and that the gap between training and validation is growing after this point — a standard diagnostic consistent with the early-stopping decision. A single re-run with a different random seed produced a best validation pinball of 0.0844 at epoch 8; the architecture appears to be stable near this operating point.

5.3 Reproducibility

All training code, hyperparameter configurations, and random seeds are recorded. Training data windowing is deterministic given the raw bars and the split boundaries. The final checkpoint (3,329,463 parameters) and its training artifacts are versioned.

6. Evaluation Protocol

6.1 Triple-barrier simulation

We evaluate ECHO on the 53-week test window using a triple-barrier protocol, standard in the event-driven machine-learning-for-finance literature. For each context window, the model produces a distribution over the 1-hour-ahead return; we take the median prediction pred_pct as a point forecast and enter a hypothetical position in the direction of pred_pct. The position is closed when one of three conditions is met:

Take-profit: price moves by |pred_pct| in the predicted direction.
Stop-loss: price moves by |pred_pct| / R in the adverse direction, where R is the reward-to-risk ratio.
Time-out: 40 bars (10 hours) elapse without either barrier being touched.

We sweep R across {1, 2, 3, 5} and also include a no-stop-loss configuration in which the stop is effectively disabled (set to a very wide value that is effectively never touched in normal market conditions). All trades are assumed to execute at the bar close, with no slippage modeled; Section 8 discusses how this assumption is likely to shift the live-trading result.

6.2 Longs-only configuration

In addition to the symmetric (both-directions) sweep, we run a longs-only variant in which all short signals are suppressed. There are two reasons to report this configuration. First, ETFs in a broadly bull market — which the 2025-04 to 2026-04 test window substantially was — bias the natural expected return positive; a longs-only evaluator aligns the sign of the strategy with this background drift and is therefore a more realistic proxy for live trading by a retail-accessible vehicle. Second, many ETFs are difficult or expensive to short: creation-redemption frictions, hard-to-borrow costs on sector and regional funds, and regulatory limits on inverse trading make short ETF positions materially more expensive than long ones. Restricting to longs removes this asymmetry from the analysis and produces a result more directly comparable to what a long-only book could achieve.

6.3 Metrics

For each R/R configuration we report the number of trades (trade count is nearly constant across configurations because entry is governed by prediction and only exit changes), win rate, total return over the 53-week test window, and weekly return computed as the total divided by 53. The weekly figure is the primary ranking metric. We also note maximum drawdown where relevant and compute an implied Sharpe ratio from the weekly return and realized volatility.

6.4 What the no-stop-loss configuration is diagnostically for

The no-stop-loss variant is not presented as a recommended deployment configuration — running a strategy without a risk cap is imprudent regardless of model quality — but as a diagnostic for the nature of the signal the model has learned. Consider two models with identical predicted direction and magnitude: a momentum model that predicts "this will keep going up" and a reversion model that predicts "this has gone down enough to come back." With a stop-loss in place, both models look similar at their best R/R point. Without a stop-loss, they diverge sharply. A momentum model left unstopped will occasionally be caught by a large adverse move that does not reverse within the time-out; the resulting loss is uncapped and the model's no-SL total return collapses. A reversion model left unstopped will occasionally be caught by a large adverse move that does reverse by the time-out or shortly after; the time-out closes the position at a less adverse level than a stop would have. The sign of the no-SL total return is therefore informative about what the model has actually learned, independent of headline R/R performance.

7. Results

7.1 All-direction R/R sweep

Table 1 reports the results of the symmetric (both-directions-allowed) R/R sweep over the 53-week test window.

Table 1: ECHO, all-direction evaluation.

R/R	Trades	WR	Total return (53w)	Weekly return
1:1	19,373	49.0%	-10.07%	-0.190%
1:2	19,388	34.3%	-0.16%	-0.003%
1:3	19,392	27.6%	+4.21%	+0.079%
1:5	19,395	21.2%	+8.33%	+0.157%
no-SL	19,395	85.4%	-51.03%	-0.963%

The sweep shows the expected monotone pattern for a low-edge signal: tight R/R configurations (1:1, 1:2) underperform because of the cost of whipsaws, while wider R/R (1:3, 1:5) capture the tail of realized moves sufficient to overcome the low hit rate. The 1:5 configuration produces a clearly positive result, +8.33% over the test window, with a 21.2% win rate over nearly 20,000 signals — a low-WR, high-expectancy pattern characteristic of genuine reversion. The no-stop-loss variant in the symmetric configuration realizes a dramatic -51.03% — because the short side of the strategy, in a broadly rising market, accumulates losses that do not revert within the 10-hour time-out.

7.2 Longs-only evaluation

Table 2 reports the longs-only configuration.

Table 2: ECHO, longs-only evaluation.

R/R	Trades	WR	Total return (53w)	Weekly return
1:1	11,872	51.5%	+4.47%	+0.084%
1:2	11,874	36.5%	+6.79%	+0.128%
1:3	11,874	29.8%	+8.46%	+0.160%
1:5	11,874	23.6%	+10.44%	+0.197%
no-SL	11,874	88.7%	+9.36%	+0.177%

Every R/R configuration is profitable when the short side is suppressed. The headline 1:5 configuration returns +10.44% over the test window, a weekly return of +0.197%, across 11,874 signals. Most notably, the no-stop-loss variant is also profitable, at +9.36% total (+0.177% per week).

7.3 The no-stop-loss sign flip

The contrast between Tables 1 and 2 at the no-SL row is the most diagnostically important result in this paper. When shorts are included, unstopped trades accumulate -51.03%: the short side, in a rising market, is a slow, consistent loser and the time-out does not rescue it. When shorts are excluded, the same unstopped policy is +9.36%: adverse long moves do, on average, revert within the 10-hour time-out, and the strategy closes these positions at a less adverse level than a stop-loss would have. This sign flip is inconsistent with the hypothesis that the model has learned directional momentum and consistent with the hypothesis that it has learned genuine reversion. Said differently, if the long positions being entered were momentum bets gone wrong, they would compound their losses when left uncut; that the reverse is true is the strongest single piece of evidence we have that ECHO is doing what we built it to do.

8. Discussion

8.1 The mean-reversion signature

We have argued across Sections 6.4 and 7.3 that the no-stop-loss sign flip between Tables 1 and 2 is the clearest behavioral evidence that ECHO has learned mean reversion rather than directional momentum. Two additional observations reinforce the conclusion. First, across the longs-only R/R sweep, the total return is monotonically increasing in R/R — from +4.47% at 1:1 to +10.44% at 1:5 — while the win rate is monotonically decreasing, from 51.5% to 23.6%. This is the fingerprint of a distribution of predicted moves that is on average too small to be captured by a tight stop but whose tail is large enough to be captured by a wide take-profit. Momentum models typically show the opposite pattern in our experience, with better performance at tighter R/R because the sign of the move is more often right than the magnitude. Second, the 1:5 win rate of 23.6% is in the range predicted by naïve theory for a no-skill, pure-reversion trade with fair odds: if the model had no edge, a 1:5 asymmetric barrier would be crossed favorably 1/(1+5) = 16.7% of the time; ECHO's realized 23.6% corresponds to a hit-rate uplift of 6.9 percentage points, which is the source of the positive expectancy.

8.2 Drawdown and implied Sharpe

The 1:5 longs-only path realized a maximum drawdown of roughly 3 to 4% over the 53-week test window against a total return of +10.44%. This is the shallowest drawdown of any model in the Zirdle ensemble and is consistent with the diversifying effect of the 164-symbol universe and the short per-trade holding horizon (max 40 bars, typical closeout much sooner). Computing a Sharpe-equivalent from the weekly return and its realized standard deviation across the test weeks gives an implied Sharpe ratio in the 2.5 to 3.0 range, subject to the usual caveats that such single-path estimates over a year of data have wide confidence intervals. We regard the implied Sharpe as a suggestive upper bound on what the strategy might deliver in live trading, before transaction costs.

8.3 Where the edge comes from

Qualitative inspection of ECHO's predictions on the test window reveals three consistent patterns. First, the model tends to predict reversion after intraday spikes of 1 to 2%, particularly in sector ETFs and volatility ETFs where such spikes are typically driven by news reactions that are subsequently re-priced. Second, during low-volatility stretches, the model's predictions cluster close to zero (absolute predicted return typically under 10 basis points), resulting in trades that either close quickly on a small take-profit or are cut by the time-out without material loss. Third, in high-ADX trending environments the model occasionally predicts continuation rather than reversion — interestingly, these trades tend to be correct on average, perhaps because the ADX signal acts as a gate telling the model that the trend is strong enough to fade its own short-horizon reversion prior. This behavior is serendipitous; we did not explicitly train for it.

None of these qualitative patterns correspond to a rule that would be obvious from a single statistical indicator: each involves a conjunction of price movement, Bollinger position, ATR level, and ADX state that the transformer learns to combine. This is our working hypothesis for why a 3.3-million-parameter transformer can find residual reversion signal beyond what (close - SMA20)/std20 can find.

8.4 Why this might not work in 2020 or 2022 Q4

The 2025-04 to 2026-04 test window was broadly a low-to-moderate-volatility, rising-market regime. Two classes of regime would stress-test ECHO in ways our evaluation cannot assess. The first is a high-volatility crash — the March 2020 COVID window, for example — in which intraday ranges widen dramatically and mean-reversion strategies tend to misfire because moves that would normally revert instead extend. The second is a protracted directional macro shock, such as the 2022 Q4 rate-shock regime, in which the Treasury-duration ETFs in our universe (TLT, IEF) trended persistently lower and sector rotation was unusually strong. Both regimes are adverse to reversion strategies on theoretical grounds; we expect ECHO's performance in such regimes to be meaningfully worse than the test-window result, and we would recommend pairing any live deployment with a volatility-regime gate (e.g. suspend trading when VIX is above some threshold) that trades off a small amount of return for a large reduction in tail-risk exposure. Section 9 returns to this point.

8.5 Transaction costs

A realistic live-trading implementation of ECHO must account for transaction costs. On the most liquid ETFs in the universe (SPY, QQQ), the typical 15-minute bar spread is on the order of 0.5 basis points, and zero-commission retail brokerage is now standard in the United States; on the less-liquid names (small-cap sector funds, regional ETFs), spreads may widen to 2 or 3 basis points. Round-trip cost per trade in the longs-only 1:5 configuration is therefore of the order of 1 to 4 basis points. The strategy's per-trade expected return, computed from +10.44% over 11,874 trades, is approximately 8.8 basis points; transaction costs would therefore consume between 10% and 45% of the gross edge depending on the symbol mix, leaving a live weekly return plausibly in the range of +0.11% to +0.18% per week. We regard this as robust, although not spectacular: the edge is modest enough that careful attention to execution (avoiding the first and last 15-minute bars, using limit orders rather than market orders, aggregating correlated signals) is likely to matter for the realized number.

8.6 Ensemble role

Within the Zirdle five-model ensemble, ECHO plays the role of capital-preservation specialist. Its absolute weekly return is modest compared to more aggressive momentum-oriented models, but its drawdown profile is the shallowest, its win-rate-to-risk relationship is the cleanest, and — most importantly — its return is largely orthogonal to the more directional models in the ensemble. In the unified 6-month ensemble evaluation across 130 symbols, ECHO's contribution compounded to +1.48% per week, the best of the five when capital is shared across the ensemble, precisely because its signals are uncorrelated with the directional bets of the other members and thus contribute disproportionately to the portfolio-level Sharpe. This is the canonical advantage of a reversion specialist in a mixed ensemble; we believe ECHO is a genuine example of it.

9. Limitations

Several limitations bear on the interpretation of our results.

Test-window regime coverage. The 53-week test window, while long enough to accumulate statistical evidence, was dominated by a particular macro regime. We do not have out-of-sample evidence from a March-2020-style crash or a 2022-Q4-style rate-shock regime, both of which we expect to be materially worse for ECHO. Walk-forward testing over multiple regimes would strengthen our confidence in the estimated edge; this is future work.

Static training window. ECHO is a fixed checkpoint trained once on 2021-06 to 2024-06 data. Real deployments should use a rolling-retrain scheme that incorporates the most recent months of data and discards stale periods. The validation-pinball gap we observed between a 3-year and 5.5-year training window suggests that even shorter rolling windows — perhaps 12 months — might perform better, though that question remains open without the engineering work to support it.

Transaction-cost modeling. Our evaluation assumes execution at the 15-minute bar close with no spread or slippage cost. Section 8.5 gives a back-of-envelope estimate of the impact, but a more rigorous cost-aware backtest with per-symbol spread data would be more credible.

Short-side availability. The longs-only configuration is the more realistic deployment, but ignoring the short side means we are discarding potential signal. A separate short-side model, possibly trained on leveraged inverse ETFs where shorting is easier, could be an orthogonal addition.

Universe narrowness. 164 symbols is a small universe relative to the global ETF complex. Expanding to non-US listings, more granular sector and regional funds, and fixed-income derivatives would diversify the test but also introduces the challenge of differing liquidity and market-hours structures.

Single-path uncertainty. All of the performance metrics reported here are single-path estimates on one held-out year. Conclusions about Sharpe ratios and weekly returns have substantial uncertainty bands that we have not attempted to quantify explicitly.

10. Conclusion

ECHO is a 3.3-million-parameter factorized-attention transformer trained to predict 15-minute-ahead return distributions on a 164-symbol ETF universe, with a feature set deliberately pruned of long-term trend indicators to enforce mean-reversion specialization. On a 53-week out-of-sample test window, the longs-only 1:5 configuration realized +10.44% total return, +0.197% per week, across 11,874 trades; more importantly, the longs-only no-stop-loss configuration realized +9.36%, a signature consistent with genuine reversion rather than directional momentum. ECHO attained the lowest validation pinball loss of any model in the Zirdle ensemble (0.0839 at epoch 7) and the shallowest realized drawdown on the test set (3 to 4%), with an implied Sharpe ratio in the 2.5 to 3.0 range. Within the ensemble, its signals are largely uncorrelated with the more directional members and its contribution to the unified 6-month evaluation compounds to +1.48% per week. We position ECHO as the capital-preservation specialist of the ensemble, with obvious caveats regarding adverse regimes and the need for eventual transition from a fixed checkpoint to a rolling-retrain production variant.

References

[1] E. Gatev, W. N. Goetzmann, and K. G. Rouwenhorst, "Pairs trading: Performance of a relative-value arbitrage rule," Review of Financial Studies, vol. 19, no. 3, pp. 797-827, 2006.

[2] M. Avellaneda and J.-H. Lee, "Statistical arbitrage in the US equities market," Quantitative Finance, vol. 10, no. 7, pp. 761-782, 2010.

[3] W. F. M. De Bondt and R. Thaler, "Does the stock market overreact?" Journal of Finance, vol. 40, no. 3, pp. 793-805, 1985.

[4] A. E. Khandani and A. W. Lo, "What happened to the quants in August 2007? Evidence from factors and transactions data," Journal of Investment Management, vol. 5, no. 4, pp. 5-54, 2007.

[5] S. M. Sarmento and N. Horta, "Enhancing a pairs trading strategy with the application of machine learning," Expert Systems with Applications, vol. 158, p. 113490, 2020.

[6] C. Han, S. Park, and K. Lee, "Pair trading via deep reinforcement learning," Quantitative Finance, 2023.

[7] T. Choudhry, "Deep learning for statistical arbitrage," arXiv preprint arXiv:2112.00444, 2021.

[8] B. Lehmann, "Fads, martingales, and market efficiency," Quarterly Journal of Economics, vol. 105, no. 1, pp. 1-28, 1990.

[9] N. Jegadeesh and S. Titman, "Returns to buying winners and selling losers: Implications for stock market efficiency," Journal of Finance, vol. 48, no. 1, pp. 65-91, 1993.

[10] J. Conrad and G. Kaul, "Mean reversion in short-horizon expected returns," Review of Financial Studies, vol. 2, no. 2, pp. 225-240, 1989.

[11] L. Harris, Trading and Exchanges: Market Microstructure for Practitioners. Oxford: Oxford University Press, 2002.

[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017.

[13] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, "A time series is worth 64 words: Long-term forecasting with transformers," in International Conference on Learning Representations, 2023.

[14] M. López de Prado, Advances in Financial Machine Learning. Hoboken, NJ: Wiley, 2018.

[15] K. Fukushima, S. Miyake, and T. Ito, "Neocognitron: A neural network model for a mechanism of visual pattern recognition," IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 826-834, 1983.

[16] R. Roll, "A simple implicit measure of the effective bid-ask spread in an efficient market," Journal of Finance, vol. 39, no. 4, pp. 1127-1139, 1984.

[17] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in International Conference on Learning Representations, 2015.

[18] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," in International Conference on Learning Representations, 2019.

Voltar à visão geral do modelo