NYSE Price Correlations Are Abitrageable Over Hours and Predictable Over Years

Nikolai Pokryshkin

Modérateur

Inscrit depuis le: 2022-07-22 09:48:36

2024-07-16 19:06:10

NYSE Price Correlations Are Abitrageable Over Hours and Predictable Over Years

1 Introduction
Only recently has tick-by-tick historical trading data across whole markets like the
New York Stock Exchange (NYSE) become available to all comers, at low cost and
outside of proprietary settings. The sites Finnhub.io and Polygon.io are among
current examples of such sources in a rapidly evolving landscape. Also only recently
has GPU software like PyTorch or TensorFlow made easy the exploitation
of financial data sets at gigabyte and terabyte scales with desktop resources. Given
these trends, one expects to see a new wave of published large-scale statistical studies
of the behavior of markets. This paper is one such.
We study the collective behavior of five years (2018–2022) of ∼1000 NYSE listed
stocks with continuous trading data at one minute resolution. The span of years is
chosen to include year 2020, during which the COVID-19 pandemic roiled markets.
Our interest is one with a long history, albeit somewhat neglected in recent years: to
characterize and quantify deviations of the market from its so-called stylized facts,
especially random-walk models, without introducing too much extra theoretical ma-

chinery; and to quantify the predictive power (in several respects, to be defined
below) of the market’s correlational structure. That the market deviates from its
stylized facts is in no sense controversial. Our goal is new quantification in a
carefully controlled large data set.
In §2 we summarize a set of these so-called facts, also reviewing the history
of discovery of the necessity of a transaction time different from clock time. We
describe the data set used in this paper. A technical point, we discuss the relation
between returns computed as two-point instantaneous price differences, versus the
difference of time-averaged prices. In §3 we compute variograms (equivalent in some
but not all ways to autocorrelation functions) over the data, noting some apparently
mean-regressing long-term memory on timescales minutes to days. In §4 we review
the venerable model of shot-noise (or, in the limit, Gaussian) processes with power-

law (Hurst exponent) variograms, also known as fractional Brownian motion. This
provides a convenient platform for discussing the advantage of variance analysis over
autocorrelation, and for demonstrating directly that profitable arbitrage is possible
in principle for such models (except for the case of a perfect random walk).
In §5 we apply §4’s arbitrage trading strategy to the actual stock-market data
and find it to be profitable about as predicted by its observed Hurst exponent. This
serves to remove any lingering doubts that the long-memory, mean-reverting behavior
measured is genuine, not an artifact of (e.g.) a flawed mapping of trading time to
clock time.
In §6 we turn to the cross-correlational structure of the ∼ 1000 NYSE stocks.

Measuring the correlation with one-hour returns over five years produces an inter-

esting “atlas” of correlation diagrams (shown in Supplementary Information). Mea-

suring correlation as a function of the time-resolution of returns produces a seeming
anomaly, which we show to be quantitatively explainable by the same apparent long-

term memory as seen in the variograms.
Section 7 looks at “leave-one-out” predictions, where we calculate what hourly
(or other) return should be expected of a stock, given the hourly returns of all other
stocks in the same hour. Then, in §8, we test an arbitrage strategy based on leave-

one-out predictions, with apparently robust positive results.
Section 9 is additional discussion.
2 Preliminaries
2.1 Stylized Facts
In economics, stylized facts, so-called, are empirical observations that evidence broad
principles without being necessarily exact in all cases. Since the work of Fama
and elaboration of the Efficient Market Hypothesis (EMH) from the 1960s,
recitations of stylized facts about the time history of market prices and returns
usually include these:
• Asset prices are nonstationary and do (some kind of) random walk.
• Sequential asset returns, i.e., price changes, are (close to) independent. More
formally, price evolution is (close to a) Markov process, therefore memoryless.
• In liquid markets, arbitrage opportunities are (almost) nonexistent. Or, equiv-

alently: The Markov process is a martingale.
• Because the market responds to the sum of innumerable small news effects, the
Central Limit Theorem should apply, and the time series are expected to be
Gaussian.
Taken together, these stylized facts imply a Brownian motion random walk (also
termed a Wiener process) as the null-hypothesis model for financial time series—at
any rate, the model to disprove with contradictory data.
Historically, the last of these stylized facts was an immediate embarrassment,
because, the distribution of many types of asset returns, sampled at equally spaced
times, is strongly non-Gaussian, with positive kurtosis and fat tails. Some exotic
solutions were proposed, for example Mandelbrot’s examination of so-called stable

NYSE Price Correlations Are Abitrageable Over Hours and Predictable Over Years

Forums

NYSE Price Correlations Are Abitrageable Over Hours and Predictable Over Years

Nikolai Pokryshkin