Test of GEX/DEX/VEX/CHEX on 1,972 SPY days: raw GEX looks great, dies after VIX + ATM IV controls

TL;DR: I tested the four dealer-exposure Greeks (GEX, DEX, VEX, CHEX) against next-day SPY outcomes on 1,972 end-of-day snapshots from Apr 2018 to Apr 2026. Raw GEX looks strong (Spearman ρ = -0.36, p ≈ 10⁻⁶⁰). After controlling for VIX and ATM IV, it drops to ρ = -0.03 (not significant). DEX has no signal even raw. VEX is basically a VIX proxy. CHEX is borderline noise at EOD. Hypotheses were pre-registered before the stats.

What I did

For each of 1,972 SPY trading days I joined same-session GEX, DEX, VEX, CHEX, gamma flip, VIX, ATM IV – with next-day return, next-day realized vol, next-day IV change.

Sort days into 5 equal groups ("quintiles") by the exposure I'm testing, check the next-day outcome.
Then residualize: regress the signal and the outcome on VIX (or VIX + ATM IV), take the residuals, re-run. If the signal has independent information, it still works. If it was a VIX proxy, it dies.
I wrote the hypotheses down in a file before running any stats. Saves me from convincing myself of whatever the data happened to show.

Quick glossary for non-quants:

Quintile = 5 equal groups. Q1 = lowest 20%, Q5 = highest 20%.
Spearman ρ = rank correlation, from -1 to +1. +1 = ranks line up perfectly. -1 = perfectly opposite. 0 = random.

The raw GEX result – this is what sells subscriptions

Next-day realized vol (%, annualized) by GEX quintile:

GEX quintile	n	Mean next-day RV
Q1 (most negative)	395	17.0%
Q2	394	18.6%
Q3 (neutral)	394	12.7%
Q4	394	9.2%
Q5 (most positive)	394	6.3%

Spearman ρ = -0.36, p ≈ 10⁻⁶⁰. On an 8-year sample that's statistically real. The Q1/Q2 mean inversion is COVID-era outliers; medians are cleanly monotonic.

The twist – what happens when you control for VIX and ATM IV

Signal → Outcome	Raw ρ	After VIX ctrl	After VIX + ATM IV
GEX → next-day RV	-0.36	-0.14	-0.03 (p=0.18)
DEX → next-day return	-0.03	+0.01	+0.02
VEX → next-day IV change	-0.16	-0.05	-0.01
CHEX → next-day return	-0.05	-0.01	-0.00

GEX survives a VIX-only control (weakened). Dies when ATM IV joins. The other three never had much to lose.

The real killer – double-sort heatmap

5×5 grid: rows = VIX quintile (V1 calmest, V5 most stressed), columns = GEX quintile. Cell = mean next-day realized vol (%).

VIX GEX	Q1	Q2	Q3	Q4	Q5
V1 (low)	8.0	7.3	6.6	5.2	5.1
V2	11.7	10.3	8.8	6.5	6.0
V3	12.0	12.1	12.2	9.6	8.6
V4	15.9	15.4	17.2	12.3	8.1
V5 (high)	20.6	24.9	37.7	21.7	15.9

Rows V1–V2 look textbook. V3–V4 are close but not strictly monotonic. V5 – the stressed regime where you actually want a signal – is a non-monotonic mess. Middle GEX has the highest RV in the entire grid.

The GEX regime split confirms it: on the top-VIX-quartile subset of 493 days, the top-vs-bottom GEX next-day RV difference is -1.89 vol points, t = -0.78, p = 0.44. No signal at all in the regime people care about most.

One-line verdicts

GEX – useful regime descriptor (positive = dealers absorb flow = pinning; negative = dealers amplify = wider tape), but no independent alpha over VIX + IV.
DEX – zero predictive content. Top-vs-bottom next-day return diff = 0.00%, p = 0.97.
VEX – 72% correlated with VIX, 76% with ATM IV. Strip those out, nothing remains.
CHEX – 54.9% sign-agreement with next-day return raw (p = 10⁻⁵). Dies under any rank control. EOD only-this study does not test the intraday last-hour-charm narrative; minute-level data is needed for that.

Collinearity matters too: DEX and VEX correlate with each other at -0.89. They are two sides of one coin.

What I'd actually do with this

Ignore GEX in high-VIX regimes. In the top VIX quartile, the signal is noise.
In calm regimes, GEX as a vol-label is fine – just know VIX would have labeled the regime the same way, and VIX is free to quote.
Stop counting GEX/DEX/VEX/CHEX as four independent signals. They span roughly 1.5 effective dimensions, and VIX+IV already covers most of those.
If you want real independent edge, the signal has to be orthogonal to VIX and IV. None of the four dealer-exposure Greeks are. Candidates worth running the same test on: VIX term structure (VIX/VIX3M), realized-minus-implied skew, order-flow imbalance, VRP residuals.

Honest caveats (these matter more than the headline)

EOD only. No intraday CHEX test.
SPY only. Single names will likely look worse (less dealer hedging, more idiosyncratic drift).
Linear OLS residualization – a nonlinear model could extract edge OLS misses.
Correlation, not PnL. A residual ρ of -0.14 does not automatically become a profitable strategy after costs, slippage, and execution lag.
2022 is the only real bear year. High-VIX regime inference rests on ~493 days.
Dealer-sign convention: positive = dealers net long that Greek. Vendors using the opposite sign will see everything flipped – conclusions identical.

Data and full write-up

Full article with regime splits, train/test (70/30), correlation matrix, per-exposure verdicts, limitations, and downloadable CSV artifacts: https://flashalpha.com/articles/gex-dex-vex-chex-8-year-backtest-spy-vix-control

https://i.redd.it/b75yb3baywwg1.png

Posted by FlashAlphaLab