Statistics Preprints for Regression and Robust Statistics

Preprints

by David Olive
Preprint M-02-006
Copyright May 2003, Dec. 2020

Back to my home page home.html

Why the Rousseeuw Yohai Paradigm is One of the Largest and Longest Running Scientific Hoaxes in History hoax.pdf
- Talk on prediction regions and intervals predtalk.pdf
- The following preprints have been submitted.
- Welagedara, Haile, and Olive (2026), ARIMA Model Selection and Prediction Intervals tspi.pdf
- THE FOLLOWING PREPRINTS HAVE NOT YET BEEN SUBMITTED OR NEED TO BE RESUBMITTED.
- Jin and Olive (2026), Large Sample Theory for Some Ridge-Type Regression Estimators ridgetype.pdf
- The following preprint may be ready for submission by July 2026.
- Olive, D.J. and Johana Lemonge, S. (2026), OLS Testing with Predictors Scaled to Have Unit Sample Variance sols.pdf
- The following 3 preprints may be ready for submission by August 2027, but I may break the next paper into paper for 1 and more than 1 response variables, and not publish the following 3 papers.
- Olive, D.J., Alsaudi, M.S., and Pathiranage, K.G. (2026), High Dimensional Dimension Reduction hddr.pdf
- Olive and Alsaudi (2026), High Dimensional Binary Regression and Classification hdbreg.pdf
- Olive (2026), Testing Multivariate Linear Regression with Univariate OPLS Estimators hdmreg.pdf
- I will incorporate some of the ideas of the following preprint into the 3 above papers. so won't publish it directly. Not nearly ready.
- Olive (2026), Qualms about Dimension Reduction Theory: Central Subspace, Envelopes, Variable Selection qualms.pdf
- The following preprint had too many ideas to be published in a major journal but part of it resulted in the paper Olive (2018) below.
- Highest Density Region Prediction hdrpred.pdf
- The Abid, Quaye, and Olive (2025) paper combines the following two preprints.
- Abid and Olive (2025), Some Simple High Dimensional One and Two Sample Tests hd1samp.pdf
- Olive and Quaye (2025), Testing Poisson Regression and Related Models with the One Component Partial Least Squares Estimator hdpois.pdf
- This preprint shows how to visualize several important survival regression models in the background of the data.
- Plots for Survival Regression sreg.pdf
- 1D Regression onedreg.pdf
- Graphical Aids for Regression. gaid.pdf
- A Simple Plot for Model Assessment simp.pdf
- THE FOLLOWING PREPRINT HAS BEEN INCORPORATED IN
- Olive, D.J. (2017), Linear Regression and
- Olive, D.J. (2017), Robust Multivariate Analysis, two Springer texts,
- and in the Olive (2018) paper Applications of Hyperellipsoidal Prediction Regions (below).
- This preprint shows the equivalence between a prediction region and a confidence region that can easily be bootstrapped. This method can be used for hypothesis testing, for robust statistics, and after variable selection. See the Pelawa Watagoda and Olive (2021ab) published papers below and the Rathnayake and Olive (2023) preprint for better theory.
- Bootstrapping Hypothesis Tests and Confidence Regions vselboot.pdf
- THE FOLLOWING 4 PREPRINTS WERE INCORPORATED IN
- Olive, D.J. (2017), Robust Multivariate Analysis, Springer, NY.
- This paper gives the first easily computed estimators of multivariate location and dispersion that have been shown to be sqrt(n) consistent and highly outlier resistant.
- Olive, D.J., and Hawkins, D.M. (2010), Robust Multivariate Location and Dispersion rmld.pdf
- This preprint shows how improve low breakdown consistent regression estimators and outlier resistant estimators that do not have theory. The resulting estimator is the first easily computed regression estimator that has been shown to be sqrt(n) consistent and high breakdown. The response plot is very useful for detecting outliers.
- Olive, D.J., and Hawkins, D.M. (2011), Practical High Breakdown Regression hbreg.pdf
- Olive, D.J. (2013), Robust Multivariate Linear Regression robmreg.pdf
- Olive, D.J. (2014), Robust Principal Component Analysis rpca.pdf
- THE FOLLOWING TWO PREPRINTS HAVE BEEN CITED BY OTHER
- AUTHORS, BUT WERE REVISED AND PUBLISHED.
- Chang, J., and Olive, D.J. (2007), Resistant Dimension Reduction resdr.pdf
- was revised and published as Chang and Olive (2010).
- Applications of a Robust Dispersion Estimator rcovm.pdf
- was revised and published as Zhang, Olive, and Ye (2012).
- THE FOLLOWING SIX PREPRINTS HAVE BEEN CITED BY OTHER AUTHORS.
- This paper shows that the bootstrap is not first order accurate unless the number of bootstrap samples B is proportional to the sample size n. For second order accuracy, need B proportional to n^2. This was published in Olive, D.J. (2014), Statistical Theory and Inference, Springer, New York, NY, ch. 9.
- Olive, D.J. (2011), The Number of Samples for Resampling Algorithms resamp.pdf
- This preprint provides some of the most important theory in the field of robust statistics. The paper shows that a simple modification to the most used but inconsistent algorithms for robust statistics results in easily computed sqrt n consistent highly outlier resistant estimators. It was converted to the Robust Multivariate Location and Dispersion and Practical High Breakdown Regression preprints above. The material is in Olive, D.J. (2017), Robust Multivariate Analysis, Springer, NY.
- Olive, D.J., and Hawkins, D.M. (2008), High Breakdown Multivariate Estimators hbrs.pdf
- The material in the following preprint is in Olive, D.J. (2017), Robust Multivarite Analysis, Springer, NY.
- Olive, D.J., and Hawkins, D.M. (2007), Robustifying Robust Estimators, preprint available from ppconc.pdf
- For location scale families, estimators based on the median and mad have optimal robustness properties. Use He's cross checking technique to make an asymptoticaly efficient estimator.
- Olive, D.J. (2006), Robust Estimators for Transformed Location-Scale Families. robloc.pdf
- The material in the following preprint is in Olive, D.J. (2017), Robust Multivarite Analysis, Springer, NY.
- Olive, D.J. (2005), A Simple Confidence Interval for the Median, preprint available from ppmedci.pdf
- The June 2008 ROBUST STATISTICS NOTES are below. PLEASE CITE THIS WORK IF YOU USE IT. Much of this work is in
- Olive, D.J. (2017), Robust Multivariate Analysis, Springer, NY.
- Olive, D.J. (2008), Applied Robust Statistics, preprint available from (http://parker.ad.siu.edu/Olive/run.pdf). robnotes.pdf
- Web page with data sets and programs to go with the course notes. robust.html
- MORE ONLINE COURSE NOTES.
- The next preprint simplifies large sample theory for elastic net, ridge regression and lasso. A new variable selection estimator with simple theory that is easy to bootstrap is given. Theory for 3 bootstrap confidence regions is given, and the coverage should be near the nominal for the new estimator. Need to update ch. 4 variable selection material as in Rathnayke and Olive (2023).
- Jan. 2023: Webpage for second draft of Olive, D.J. (2023), Prediction and Statistical Learning.
- http://parker.ad.siu.edu/Olive/slearnbk.htm
- Need to incorporate Rathnayake and Olive (2023) variable selection material into Chapter 10.
- Jan. 2023: Webpage for first draft of Math 584 notes Olive, D.J. (2023), Theory for Linear Models.
- http://parker.ad.siu.edu/Olive/linmodbk.htm
- Need to update the variable selection material as in Rathnayke and Olive (2023).
- Jan. 2023: Webpage for first draft of Math 473 notes Olive, D.J. (2023), Survival Analysis. http://parker.ad.siu.edu/Olive/survbk.htm
- Jan. 2023: Webpage for first draft of Olive, D.J. (2023), Large Sample Theory. http://parker.ad.siu.edu/Olive/lsampbk.htm
- Need to update ch. 10 variable selection material as in Rathnayke and Olive (2023).
- Jan. 2023: Webpage for first draft of Olive, D.J. (2023), Robust Statistics. http://parker.ad.siu.edu/Olive/robbook.htm
- WEBPAGES AND COURSE NOTES TO GO WITH THREE PUBLISHED BOOKS
- TWO COMPETITORS FOR Casella and Berger (2002), Statistical Inference:
- Olive, D.J. (2008), A Course in Statistical Theory, preprint available from (http://parker.ad.siu.edu/Olive/infer.htm). infer.htm
- Olive, D.J. (2014), Statistical Theory and Inference, Springer, New York, NY.
- The Springer eBook is available on SpringerLink, Springer's online platform. http://dx.doi.org/10.1007/978-3-319-04972-4
- TWO COMPETITORS FOR Kutner, Nachtsheim, Neter, and Li (2005), Applied Linear Statistical Models:
- Olive, D.J. (2010), Multiple Linear and 1D Regression Models, preprint available from (http://parker.ad.siu.edu/Olive/regbk.htm). regbk.htm
- Olive, D.J. (2017a), Linear Regression, Springer, New York, NY.
- The Springer eBook is available on SpringerLink, Springer's online platform. http://dx.doi.org/10.1007/978-3-319-55252-1
- A COMPETITOR FOR Johnson and Wichern (2007), Applied Multivariate Analysis:
- Olive, D.J. (2017b), Robust Multivariate Analysis, Springer, New York, NY.
- The Springer eBook is available on SpringerLink, Springer's online platform. https://link.springer.com/book/10.1007%2F978-3-319-68253-2
- Jan. 2013 1st draft of Robust Multivariate Analysis:
- http://parker.ad.siu.edu/Olive/multbk.htm
- Here are some rejected Letters to the Editor. Erratum should have been published.
- This slightly revised letter was sent to the Journal of Computational and Graphical Statistics about the latest Fake-MCD estimator of Hubert, Rousseeuw and Verdonk (2012). It pointed out that DetMCD is not the MCD estimator, that DetMCD has no theory, and that it will be a massive undertaking to modify the theory for concentration estimators in Olive and Hawkins (2010) to show whether DetMCD has any good properties.
- Fake MCD fakemcd.pdf
- This letter was sent to the Annals of Statistics regarding the Bali, Boente, Tyler and Wang (2011) bait and switch paper.
- Fake Projection Estimator fakeproj.pdf
- This Letter was sent to The Annals of Statistics regarding the Salibian-Barrera and Yohai (2008) bait and switch paper. After a rejection, it was revised and sent to the American Statistician as a paper, but rejected.
- The Breakdown of Breakdown bdbd.pdf
- THE NEXT 11 DOCUMENTS MAY BE OF MILD INTEREST, BUT WILL PROBABLY NEVER BE PUBLISHED.
- The following 5 preprints have been incorporated into the published paper Olive (2013) ``Plots for Generalized Additive Models."
- Response Transformations for Models with Additive Errors rtrans.pdf
- Response Plots and Related Plots for Regression rplot.pdf
- Response Plots for Linear Models lm.pdf
- Response Plots for Experimental Design rploted.pdf
- Plots for Binomial and Poisson Regression gfit.pdf
- Comments on Breakdown bkdn.pdf
- Abuhassan, H. and Olive, D.J. (2008), Inference for the Pareto, Half Normal and Related Distributions. std.pdf
- (long version of) Robustifying Robust Estimators lconc.pdf
- Prediction intervals in the presence of outliers pi.pdf
- This 1996 result grew into a 2002 JASA discussion paper. dense.pdf
- This 1997 result on partitioning may be of mild interest. part.pdf
- THIS IS MY PhD DISSERTATION: Olive, D.J. (1998), Applied Robust Statistics, Ph.D. Thesis, University of Minnesota. It shows my 1998 ideas on Robust Statistics. The Figures are missing and the page numbers differ from the orignial dissertation. arsdiss.pdf
- THE FOLLOWING ARE PREPRINTS OF PUBLISHED OR ACCEPTED PAPERS.
- Olive, D.J. Alshammari, A.A., Pathiranage, K.G., and Hettige. L.A.W. (2026), Testing with the One Component Partial Least Squares and the Marginal Maximum Likelihood Estimators, Communications in Statistics: Theory and Methods, 55, 1492-1507. hdwls.pdf
- Olive, D.J. and Johana Lemonge, S. (2026), Estimating Covariances and Goodness of Fit Plots for Accelerated Failure Time Models, Axioms, 15, 15. (vol. 15, article 15) survcov.pdf
- The following preprint has a lot of theory for the high dimensional one and two sample tests, as well as a high dimensional test for testing Ho: beta = 0 for GLMs et cetera.
- Abid, Quaye, and Olive (2025), A High Dimensional Omnibus Regression Test, Stats, 8, 107. hdomni.pdf
- Olive, D.J. (2025), Some Useful Techniques for High Dimensional Statistics, Stats, 8, 60. hdtechn.pdf
- The following preprint greatly increases the scope of data splitting for regression, and finds the large sample theory for OPLS.
- Olive, D.J., and Zhang, L. (2025), One Component Partial Least Squares, High Dimensional Regression, Data Splitting, and the Multitude of Models, Communications in Statistics: Theory and Methods, 54, 130-145.opls.pdf
- The following preprint gives the large sample theory for some ARMA model selection estimators. The preprint also shows how to use bootstrap confidence regions for hypothesis testing.
- Haile and Olive (2024), Bootstrapping ARMA Time Series Models after Model Selection, Communications in Statistics: Theory and Methods, 53, 8255-8270. tsboot.pdf
- Welagedara, W.A.D.M. and Olive, D.J. (2024), Calibrating and Visualizing Some Bootstrap Confidence Regions, Axioms, 13(10), 659.
- This paper shows how to get better cutoffs for many common tests, and gives a new weighted least squares method.
- Rajapaksha, K.W.G.D.H. and Olive, D.J (2024), Wald Type Tests with the Wrong Dispersion Matrix, Communications in Statistics: Theory and Methods, 53, 2236-2251. waldtype.pdf
- The following preprint gives a data splitting prediction region and shows how to predict the random walk.
- Haile, Zhang, and Olive (2024), Predicting Random Walks and a Data Splitting Prediction Region, Stats, 7(1), 23-33. rwalkpi.pdf
- This paper gives the large sample theory for many variable selection estimators for several important regression models. A new estimator that does not have selection bias is given. The preprint also shows how to use bootstrap confidence regions for hypothesis testing for the usual and new variable selection estimators.
- Rathnayake, R.C. and Olive, D.J. (2023), Bootstrapping Some GLM and Survival Regression Variable Selection Estimators, Communications in Statistics: Theory and Methods, 52, 2625-2645. bootglm.pdf
- R code: Rcodebootglm.pdf
- This paper shows how to get prediction intervals for a large class of parametric regression models such as GLMs, GAMs, and survival regression models. The PIs can work after variable selection and if the number of predictors is larger than the sample size.
- Olive, D.J, Rathnayake, R.C., and Haile, M.G. (2022), Prediction Intervals for GLMs, GAMs, and Some Survival Regression Models, Communications in Statistics: Theory and Methods, 51, 8012-8026. pigam.pdf R code: Rcodepigam.pdf
- This paper gives prediction intervals that can be useful when the sample size is less than the number of variables. These prediction intervals are useful for comparing shrinkage estimators like forward selection and lasso. Large sample theory for lasso, the elastic net, and ridge regression is simplified. New large sample theory for many OLS variable selection estimators is given. The theory shows that lasso variable selection is sqrt(n) consistent when lasso is consistent.
- Pelawa Watagoda, L.C.R. and Olive, D.J. (2021b), Comparing Six Shrinkage Estimators With Large Sample Theory and Asymptotically Optimal Prediction Intervals, Statistical Papers, 62, 2407-2431. picomp.pdf
- This paper gives theory for three useful bootstrap confidence regions. We use betahatImin0 to denote the variable seletion estimator, but we are using the usual estimator betahatVS and a new estimator betahatMIX, and the paper would be clearer if we did not use context to decide which estimator betahatImin0 is. The large sample theory for betahatMIX is derived, and is only asymptotically equivalent to that of betahatVS under strong regularity conditions. See the above paper and Rathnayake and Olive (2020). Theory for 3 bootstrap confidence regions is given.
- Pelawa Watagoda, L.C.R. and Olive, D.J. (2021a), Bootstrapping Multiple Linear Regression After Variable Selection, Statistical Papers, 62, 681-700. piboottest.pdf
- This paper shows how to bootstrap analogs of the one way MANOVA model where we do not assume equal covariance matrices.
- Rupasinghe Arachchige Don, H.S., and Olive, D.J. (2019), Bootstrapping Analogs of the One Way MANOVA Test, Communications in Statistics: Theory and Methods, 48, 5546-5558. manova.pdf
- This paper shows that applying the Olive (2013b) nonparametric prediction region to a bootstrap sample can result in a confidence region, and applying the prediction region to Yhat_f + e_i, where the e_i are residual vectors, results in a nonparametric prediction region for a future response vector Y_f for multivariate regression.
- Olive, D.J. (2018), Applications of Hyperellipsoidal Prediction Regions, Statistical Papers, 59, 913-931. hpred.pdf
- Olive, D.J., Pelawa Watagoda, L.C.R., and Rupasinghe Arachchige Don, H.S. (2015), Visualizing and Testing the Multivariate Linear Regression Model, International Journal of Statistics and Probability, 4, 126-137. vtmreg.pdf
- This paper gives response plots, plots for response transformations and plots for detecting overdispersion for GAMs and GLMs.
- Olive, D.J. (2013a), Plots for Generalized Additive Models, Communications in Statistics: Theory and Methods, 42, 2610-2628. gam.pdf R/Splus code: gamcode.txt
- Olive, D.J. (2013b), Asymptotically Optimal Regression Prediction Intervals and Prediction Regions for Multivariate Data, International Journal of Statistics and Probability, 2, 90-100. apred.pdf
- This paper describes the sqrt(n) consistent highly outlier resistant FCH, RFCH and RMVN estimators and gives an application for canonical correlation analysis.
- Zhang, J., Olive, D.J., and Ye, P. (2012), Robust Covariance Matrix Estimation with Canonical Correlation Analysis, International Journal of Statistics and Probability, 1, 119-136. rcca.pdf
- This paper shows that OLS partial F tests, originally meant for multiple linear regression, are useful for exploratory purposes for or a much larger class of models, including generalized linear models and single index models.
- Chang, J. and Olive, D.J. (2010), OLS for 1D Regression Models, Communications in Statistics: Theory and Methods, 39, 1869-1882. sindx.pdf
- Olive, D.J. and Hawkins, D.M. (2007), Behavior of Elemental Sets in Regression, Statistics and Probability Letters, 77, 621-624. elem.pdf
- This paper shows how to construct asymptotically optimal prediction intervals for regression models of the form Y = m(x) + e. The errors need to be iid unimodal and emphasis is on linear regression.
- Olive, D.J. (2007), Prediction Intervals for Regression Models, Computational Statistics and Data Analysis, 51, 3115-3122. spi.pdf
- This paper shows that the variable selection software originally meant for multiple linear regression gives useful results for a much larger class of models, including generalized linear models and single index models, if the Mallows' Cp criterion is used. For models I with k predictors, the screen Cp(I) < 2k is much more effective than the screen Cp(I) < k. Use response plots to show that the final submodel is similar to the original full model.
- Olive, D.J. and Hawkins, D.M. (2005), Variable Selection for 1D Regression Models, Technometrics, 47, 43-50. varsel.pdf
- Olive, D.J. (2005), Two Simple Resistant Regression Estimators, Computational Statistics and Data Analysis, 49, 809-819. mba.pdf
- The MBA estimator is not as good as the FCH estimator in "High Breakdown Robust Estimators," but was the first easily computed estimator of multivariate location and dispersion shown (in 2004) to be sqrt(n) consistent and highly outlier resistant. See "Robustifying Robust Estimators" or "Applied Robust Statistics" for proofs.
- Olive, D.J. (2004a), A Resistant Estimator of Multivariate Location and Dispersion, Computational Statistics and Data Analysis, 46, 99-102. rcov.pdf
- The following paper suggests ways to robustify regression techniques for single index models and sliced inverse regression.
- Olive, D.J. (2004b), Visualizing 1D Regression, in Theory and Applications of Recent Robust Methods, edited by M. Hubert, G. Pison, A. Struyf and S. Van Aelst, Series: Statistics for Industry and Technology, Birkhauser, Basel, 221-233. vreg.pdf
- Olive, D.J., and Hawkins, D.M. (2003), Robust Regression with High Coverage, Statistics and Probability Letters, 63, 259-266. hcov.pdf
- The following paper provides a simultaneous diagnostic for whether the data follows a multivariate normal distribution or some other elliptically contoured distribution. It also provides a nice way to estimate and visualize single index models.
- Olive, D.J. (2002), Applications of Robust Distances for Regression, Technometrics, 44, 64-71. rdist.pdf
- The following paper gives extremely important theoretical results. It shows that software implementations for estimators of robust regression and robust multivariate location and dispersion tend to be inconsistent with zero breakdown value. The commonly used elemental basic resampling algorithm draws K elemental sets. Each elemental fit is inconsistent, so the final estimator is inconsistent, regardless of how the algorithm chooses the elemental fit. The CM, GS, LMS, LQD, LTS, maximum depth, MCD, MVE, one step GM and GR, projection, S, tau, t type, and many other robust estimators are of little applied interest because they are impractical to compute. The "Robustifying Robust Estimators" paper shows how modify some algorithms so that the resulting regression estimators are easily computed sqrt n consistent high breakdown estimators and the resulting multivariate location and dispersion estimators are sqrt n consistent with high outlier resistance.
- Hawkins, D.M., and Olive, D.J. (2002), Inconsistency of Resampling Algorithms for High Breakdown Regression Estimators and a New Algorithm (with discussion), Journal of the American Statistical Association, 97, 136-148. incon.pdf
- This paper gives a graphical method for estimating response transformations that can be used to complement or replace the numerical Box-Cox method.
- Cook, R.D., and Olive, D.J. (2001), A Note on Visualizing Response Transformations, Technometrics, 43, 443-449. resp.pdf
- Olive, D.J. (2001), High Breakdown Analogs of the Trimmed Mean, Statistics and Probability Letters, 51, 87-92.rloc.pdf
- Hawkins, D.M., and Olive, D.J. (1999a), Improved Feasible Solution Algorithms for High Breakdown Estimation, Computational Statistics and Data Analysis, 30, 1-11. ifsa.pdf
- Hawkins, D.M., and Olive, D. (1999b), Applications and Algorithms for Least Trimmed Sum of Absolute Deviations Regression, Computational Statistics and Data Analysis, 32, 119-134. lta.pdf

Comments: Webmaster