Quantitative Trading Theory (2) — Statistics Fundamentals

In this article we cover the fundamentals of Statistics for Quantitative Trading. The connections between the various topics covered and the motivation behind their developments will be discussed. In addition to books, research papers on robust estimation will be surveyed. It’s assumed that the reader has at least an elementary understanding of probability theory but not necessarily any statistical knowledge. As always, we begin with an overview of the topics discussed.

Statistics Fundamentals

Building models based on noisy dependent observations sampled from constantly evolving distributions requires a statistical understanding beyond the rudimentary. Some of the key results of introductory statistical texts rely heavily on asymptotic properties realised only by large independent samples and are contingent on assumptions that are simply invalid for financial datasets. Moreover, the focus on efficient unbiased estimators fails to acknowledge the tradeoff being made between model bias and model variance (“”The Bias-Variance Tradeoff”).

Nevertheless, the standard linear regression model established there forms a solid base from which to build upon. Structural tests to verify each of the underlying assumptions can be constructed, the statistical significance of conclusions can be assessed and confidence intervals around parameter estimates can be formed. By individually addressing each violated model assumption, a more realistic generalised regression model can be formulated that accounts for non-linear relationships, serially correlated residuals, endogeneity and heterogeneity.

Critical to the results surrounding the standard linear regression model and the generalised regression model is the Central Limit Theorem (CLT). The Central Limit Theorem presupposes independent observations, which is rarely accurate for real-world financial datasets of interest. Ergodic theory replaces the Central Limit Theorem’s independent observations assumption with one of stationarity. Fortunately, an abundance of real-world processes are stationary. Time Series Analysis studies such stationary processes using models such as ARIMA and GARCH that can be motivated by Wold’s Decomposition Theorem.

Finally, we address the Bias-Variance tradeoff. The expected test error is decomposed as the sum of the model bias squared plus the model variance. By increasing the model bias it’s usually possible to achieve lower model variance and overall expected test error. For example, ridge regression achieves this by penalising large parameter estimate values (after standardising the explanatory variables), which turns out to be equivalent to imposing zero-mean normal prior distributions on the parameter estimates. A consequence of ridge regression estimation is that the parameter estimates are shrunk towards zero relative to the ordinary least squares fit and in particular across “”directions” where the explanatory variables vary the least. Using cross-validation, the expected test error can be estimated and minimised over a parameter space in situations where data is scarce.

We now begin our survey with a book that will build the foundation of our statistical knowledge.

Econometric Analysis

In Econometric Analysis, the author William Greene builds everything from the ground up and starts with a rigorous treatment of the linear regression model. The assumptions of the standard linear regression model are explicitly stated and defined. Tests are developed to verify these assumptions. The equivalence between the least squares fit and the maximum likelihood estimate is established. Moreover, the Gauss-Markov Theorem is proved, which demonstrates that under the standard assumptions, the least squares fit is the best linear unbiased estimator (BLUE).

Moving through Chapter One to Chapter Nine, each of the standard linear regression model assumptions are individually addressed. By revising the assumptions to account for more general scenarios, models that can deal with non-linear relationships, serially correlated residuals, heterogeneity and endogeneity are constructed. Asymptotic and finite-sample properties of various statistics are derived and used to construct statistical tests. In particular, the Wald test and F-Statistic are introduced to assess the individual and joint statistical significance of explanatory variables.

In Chapter Three, the coefficient of determination (commonly known as R Squared and denoted R²) is studied as a measure of “”goodness-of-fit”. Under the condition that a constant term is included in the regression, the equivalence between three different formulations of the coefficient of determination is established. Without this extra condition, it’s possible for the coefficient of determination to be greater than one or even negative depending on the formula used. It’s important to bare this in mind when using computational packages. The coefficient of determination is then adjusted to make it suitable for model selection among nested models.

Towards the end of Chapter Four, the increased uncertainty in parameter estimates that arises from multicollinearity (correlations between explanatory variables) is studied. In addition, the concept of influential observations is introduced and measures of influential observations such as Cook’s distance and Studentized residual deviations are established. Chapter Five investigates methods for testing the statistical significance of results, selecting the best model from a collection of non-nested models and also the process of building models. Chapter Six considers the use of binary variables, in addition to modelling and testing for structural breaks. When using binary variables it’s important to ensure that they’re not collectively linearly dependent; in this situation one of the binary variables should be dropped.

The linearity of the regression model is weakened in Chapter Seven. Chapter Eight introduces Instrumental Variables as a method to deal with violations of the exogeneity (uncorrelated residuals and explanatory variables) assumption and concludes Part I of Econometric Analysis. In Part II, the Generalized Regression Model for handling heteroscedasticity is discussed during Chapter Nine, while the remaining chapters in Part II generalise the results to systems of equations and discrete random variables. Part III, comprised of Chapters Twelve to Sixteen, considers estimation procedures and methodologies. Part IV, comprised of Chapters Seventeen to Nineteen, deals with discrete random variables.

The final two chapters, making up the entirety of Part V, lead us into the realm of Time Series Analysis. In these two chapters, the concept of stationarity and how it can be used to circumvent violations of the independent variables assumption is established. Moreover, serial correlations in the residuals are treated. Since it’s difficult to directly test for stationarity, tests for the weaker property of covariance stationary are generally carried out. These include: the Augmented Dickey-Fuller (ADF) test and the KPSS test. Co-integration is the construction of stationary time series from linear combinations of series that aren’t necessarily stationary.

Analysis of Financial Time Series

Ruey Tsay’s Analysis of Financial Time Series begins where Econometric Analysis left off in the study of Time Series Analysis, albeit with a focus on application rather than technical rigour.

It starts by defining the time series of main interest, the returns time series, and highlights properties of empirically observed returns that deviate from the standard linear regression model assumptions. The concepts of stationary and autocorrelations are revisited as a relaxation of the standard assumptions. The ARCH model is introduced to cater to the heterogeneous tendency for large return movements to be immediately followed by further large movements and Markov switching models are used to capture transitions between distinct market regime periods where returns behave differently. In the final chapter, Markov chain Monte Carlo methods are applied to tackle the common real-world problems of missing observations and invalid data points.

Robust Estimation and Outlier Detection

Below is a list of estimators robust to outliers and tests for detecting outliers.

Median Absolute Deviation (MAD), Interquartile Range (IQR), Rousseeuw-Croux alternatives to MAD, Tietjen-Moore Test, Generalised Extreme Studentized Deviate (GESD) Test, Minimum Covariance Determinant (MCD), Monte Carlo Markov Chain/Gibbs Sampling.

For further information explore the following linked articles.

+ Alternatives to Median Absolute Deviation,
+ Minimum Covariance Determinant Summary,
+ A Fast Algorithm for the Minimum Covariance Determinant Estimator

An Introduction to Statistical Learning & The Elements of Statistical Learning

The Bias-Variance tradeoff and the expected test error take centre stage in An Introduction to Statistical Learning (ISL) and The Elements of Statistical Learning (ESL). The linear regression model is casted into the context of supervised learning with an explicit objective of minimising the expectation of a specific loss function. For the linear regression model, the sum of residual squares loss function can be decomposed as the sum of the model bias squared plus the model variance. This emphasise on both the model bias and the model variance (and not just model bias) was missing from the previous books reviewed.

Regularisation adds a parameterised penalty function to the loss function. The result of minimising the adapted loss function is a fitted model with higher bias but a reduction in model variance. Cross-validation allows the expected test error to be estimated when there’s not enough data to effectively train and test a model. By varying the regularisation parameter, an optimal model that minimises the expected test error among a class of models can be obtained. Unfortunately, estimation of the expected test error conditional on the observed training set is generally not tractable.

Approaches for combining several models to form a single improved model are discussed. These techniques include boosting, bagging and bumping. When combining models it’s usually possible to decrease the model variance, however, there’s a price paid in terms of interpretability. In addition to these techniques several models are covered, these include: Neural Networks, Random Forests, Splines and Support Vector Machines. The remaining chapters deal with the topic of unsupervised learning where no explicit loss function is specified.

An Introduction to Statistical Learning (ISL) and The Elements of Statistical Learning (ESL) are written by overlapping authors, with ISL designed as a more accessible version of ESL. Both books provide a comprehensive treatment of statistical learning. The material is covered in greater technical detail in ESL and additionally discusses a few more advanced topics. The focus of both books is on developing an intuitive understanding rather than obsessing too much on the technical details.

Free Book Downloads: An Introduction to Statistical Learning and The Elements of Statistical Learning

An Introduction to Statistical Learning and The Elements of Statistical Learning can both be downloaded for free from the authors' websites.