You are visitor number since 26 December 2005

Lecture schedule --- R --- Info on students ---- Problem Sets --- a few statistics links (under construction) -- selected references (under construction)

## Course information

This course is designed as a lecture course covering various topics in Statistical analysis (see below). I assume students have some modest background in statistics and we build on this by discussing a number of topics. The goal of this course is to provide students with a better feel for statistics and to be much less intimidated by methods of statistical analysis.

Course Objectives: We will introduce statistical distributions and computing the statistical power of various designs, matrix algebra useful for statistics and the general linear model, maximum likelihood estimation and testing, Bayesian Statistics, and various resampling and randomization methods. The focus is obtaining a general understanding of these statistical tools rather than which computer programs to use. Thus, the course will be somewhat more theoretical than applied, but the student will leave with a much broader understanding than a course concerned with running various statistical packages.

Math/Stats background required: Some knowledge of Calculus and a previous stats course (which introduced covariance, regression and ANOVA) is desirable.

Computer Programs: While the course focus is in basic statistical concepts, we will also introduce the R computing language. R: is one of the most powerful and flexible statistical programs, with a very large (and growing) library. Bad news: a little hard to get started on. good news: FREE!! (This is essentially S+, for those of you who have heard of this). More details are given below.

Class textbooks/reading There is no formal textbook for the class, although there will be extensive readings for most lectures (posted as pdf files below).

You also might wish to buy one or more of the following textbooks on using R

• An introduction to R (Revised and updated). W. N. Venables, D. M. Smith, etc. 146 pages. (\$14.00 on Amazon)
• This is a very short paperback that essentially lists all of the commands in R. A nice quick reference, but little detail beyond this.
• Introductory Statistics with R . P. Dalgaards.
• Nice review of basic statistical applications using R. Paperback, 267 pages. (\$35.00 on Amazon)
• Using R for Introductory Statistics J. Verzani.
• More extensive that Dalgaards, with good discussions on programming. Hardback, 414 pages (\$41.00 on Amazon)
• Statistical computing: An introduction to Data Analysis using S-plus (S+ is essentially R). M. J. Crawley.
• A much more comprehensive introduction to R/S+ with lots of examples. Hardback, 760 pages (\$87.00 on Amazon)

### Meeting time and Place: Tuesday and Thursday, 9:30 a.m. -10:45 a.m. LSS 340

Instructor: Bruce Walsh:

• office: BSW 322
• phone: 621-1915
• Office hours -- by appointment
• e-mail (jbwalsh@u.arizona.edu)

## The R Statistical Programming Language

UA R users group website

• Alpha Unix (OSF/Tru64)
• Linux
• MacOS (System 8.6 to 9.1 and MacOS X)
• MacOS X (Darwin/X11)
• Windows (95 and later)

An Introduction of R (Walsh notes)

1. R as a basic statistical calculator for obtaining p values and plotting probability distributions (6 page pdf file).
2. Power Calculations in R (4 page pdf file).
3. Matrix Calculations in R (3 page pdf file).
4. Bootstrap and jackknife in R (5 page pdf file).
5. The Metropolis-Hastings Sampler in R (4 page pdf file).

pdf files of The official R Manuals

• An Introduction to R (approx. 100 pages, 650kB), based on the former "Notes on R", gives an introduction to the language and how to use R for doing statistical analysis and graphics.
• Quick reference card (1 page, 60kB)
• A draft of the R language definition (approx. 60 pages, 380kB) which document the language per se. That is, the objects that it works on, and the details of the expression evaluation process, which are useful to know when programming R functions.
• Writing R Extensions (approx. 70 pages, 450kB) covers how to create your own packages, write R help files, and the foreign language (C, C++, Fortran, ...) interfaces.
• R Data Import/Export (approx. 30 pages, 270kB) describes the import and export facilities available either in R itself or via packages which are available from CRAN.
• R Installation and Administration (approx. 30 pages, 200kB)
• The R Reference Index (approx. 2145 pages, 10.5MB) contains all help files of the R base packages in printable form.

## Lecture schedule

(VERY tentative, topics may be added/deleted per wishes of class)

 DATE Day Lect. # Topic Handouts Problem Sets 12 Jan Thursday 1 Overview: Probabilities and Probability Distributions Univariate Distributions 17 Jan Tuesday 2 Overview: Univariate distributions PS 1 19 Jan Thursday 3 Overview: Bivariate distributions Bivariate Distributions 24 Jan Tuesday 4 Normal, t, Chi-square distributions (1): Distributions of functions of normals, PS 1 Solutions 26 Jan Thursday 5 F distributions PS 2 31 Jan Tuesday 6 Power of tests 1: Normals (1): Power, PS 2 Solutions 2 Feb Thursday 7 Power of tests 2: Fixed Effects ANOVAs 7 Feb Tuesday 8 Power of tests 3: Random Effects ANOVAs PS 3 9 Feb Thursday No class, Walsh at NIH Central Limit theorem problem PS 4 14 Feb Tuesday 9 Matrix algebra 1: addition, multiplication (1): Intro to Matrix Algebra and linear models PS 3 due 16 Feb Thursday 10 Matrix algebra 2: Inversion and the Multivariate Normal PS 4 due 21 Feb Tuesday 11 Matrix algebra 3: The Multivariate Normal 23 Feb Thursday 12 Matrix algebra 3: The Multivariate Normal PS 5 28 Feb Tuesday 13 General linear model (GLM) 1: OLS General linear Model PS 5 due 2 March Thursday 14 GLM 2: Generalized inverses, systems of equations Generalized inverses PS 6 7 March Tuesday 15 GLM 3: Geometry of matrices, PC matrix Eigenstructure PS 7 PS 6 due 9 March Thursday 16 PS 7 due 14 March Thursday Spring Break 16 March Thursday Spring Break 21 March Tuesday 17 23 March Thursday 18 PS 8 due 28 March Tuesday 19 Maximum Likelihood estimation, Likelihood ratio tests MLEs PS 9 30 March Thursday No class, Walsh at UCSF 4 April Tuesday 20 Generalized Linear models Generalized Linear models PS 9 due 6 April Thursday No class, Walsh seminar at University of Florida 11 April Tuesday 21 Resampling methods 1: Randomization and the Jackknife Resampling methods 13 April Thursday 22 Resampling methods 2: The Bootstrap Bootstrap and Jackknife in R PS 10 18 April Tuesday 23 Multiple comparisons: 1: Sequential Bonferroni corrections and the False Discovery Rate Multiple comparisons 20 April Thursday 24 Multiple comparisons: 2: the False Discovery Rate PS 10 due 25 April Tuesday 25 Bayesian methods: Introduction Bayesian methods 27 April Thursday 26 Bayesian methods: Advanced topics 2 May Tuesday 27 MCMC methods MCMC and Gibbs Sampler The Metropolis-Hastings Sampler in R

## Problem Sets

 Problem set Topic Due date Solutions 1 Regressions, covariances 24 Jan PS 1 Solutions 2 Confidence Intervals 31 Jan PS 2 Solutions 3 Power with z and t tests 14 Feb PS 3 Solutions 4 Power with F tests 16 Feb PS 4 Solutions 5 Basic Matrices, MVN 28 Feb PS 5 Solutions 6 Intro to GLM 7 March PS 6 Solutions 7 Generalized Inverses 9 March PS 7 Solutions 8 More GLM fun 23 March PS 8 Solutions 9 Matrix Eigenstructure 4 April PS 9 Solutions 10 Resampling Approaches 20 April PS 10 Solutions 10 MCMC PS 11 Solutions
Data for PS 10!

data <- c(8.26, 6.33, 10.4, 5.27, 5.35, 5.61, 6.12, 6.19, 5.2, 7.01, 8.74, 7.78 , 7.02, 6, 6.5, 5.8, 5.12, 7.41, 6.52, 6.21, 12.28, 5.6, 5.38, 6.6, 8.74)

## Selected Statistics References

1. Randomization, Boostrap and Monte Carlo methods in biology (2nd ed). Bryan F. J. Manly (1997).

2. Bayesian Hierarchical Modeling David Draper. You can download a postscript file of the draft version from Draper's website

3. Generalized, Linear, and Mixed Models. Charles E. McCullock and Shayle R. Searle. (2001).

4. Categorical Data Analysis, (2nd Ed.). Alan Agresti. (2002).

5. Multivaraite Statistics: A Practical Approach. Berhard Flury and Hans Riedwyl. (1988)

6. Applied Nonparametric Statistical methods. P. Sprent. (1989)

7. Experiments: Planning, Analysis, and Parameter Design Optimization. C. F. Jeff Wu and Michael Hamada. (2000)

8. Statistical Analysis with Missing Data. Roderick J. A. Little and Donald B. Rubin. (2002).

9. Bayesian Statistics: An Introduction (2nd ed). Peter M. Lee (1997).

10. Applying Generalized Linear Models. James K. Lindsey (1997).

11. Tools for Statistical Inference: Methods for exploration of posterior distributions and likelihood functions (3rd ed). Martin Tanner (1996).

12. Statistical Principles in Experimental Design (3rd ed). B. J. Winer, Donald R. Brown, and Kenneth M. Michels (1991).

13. Intutive Biostatistics. Harvey Motulsky.

14. Statistics as Principled Argument. Robert Abelson.

15. Markov chain Monte Carlo: Stochastic simulation for Bayesian inference.Dani Gamerman (1997).

16. The Ecological Detective.Ray Hilborn and Marc Mangel (1997).

17. Mathematical and Statistical Methods for Genetic Analysis Keenth Lange (1997).

18. Statistical Data Analysis. Glen Cowan (1998).

19. Design and Analysis of Ecological Experiments. Samuel Scheiner and Jessice Gurevitch, Eds (1993).

20. Regression Modeling Strategies. Frank E. Harrell, Jr. (2001).