Statistical Computing: Application of Comp. Sci. to Statistics
Computational statistics: Design of algorithms for implementing statistical methods on computers
This course: more of former
STAT545: more of the latter
Broadly: to learn programming for Statistics/Data Science
Our focus will be on
“The Art of R Programming: A Tour of Statistical Software. Design”, Norman Matloff
“R for Data Science”, Garrett Grolemund and Hadley Wickham. (Amazon but also available free)
Also useful:
(Approximately) weekly assignments
Will involve reading, writing and programming
Are vital to doing well in the exams
Late homework will not be accepted
One (worst) homework will be dropped
You may discuss problems with other sudents, but must:
Central to modern statistics/data analysis. We want:
Programming involves:
A programming language and environment for statistics
A GNU project available as Free software.
(“Think free as in free speech, not free beer”: Richard Stallman)
You can (and should):
You will:
Based on Bell Labs’ S language by John Chambers
Started by Ihaka and Gentleman at the Univ.of Auckland R: A Language for Data Analysis and Graphics, (1996)
A high-level interpreted language with convenient features for loading, manipulating and plotting data
A huge collection of user-contributed packages to perform a wide variety of tasks
Widely used in academia, and increasingly popular in industry
Starting R begins a new session
R presents you with a command prompt or console
Can interact with R through the console:
1 + 3
x <- rgamma(3,2,1); x # Generate Gamma(2,1) variables
x <- rnorm(1000)
plot(x+(1:1000)/100)
RStudio provides a more convenient Integrated Development
Environment (IDE) to interact with R
Layout includes
Convenient user interface: point-and-click, autocomplete, help etc.
You should install RStudio Desktop (available at rstudio.org )
(run Alt-Shift-K for a list of all shortcuts)
While we often use R interactively, it is useful to do this through scripts
Ultimately, R is a full-fledged programming language for statistical computing: Treat it as such!
Filenames should end with .R (e.g. denoise.R )
Scripts should have explanatory comments
Variables should have informative names
Scripts should be indented appropriately
See R style-guides from:
We will look up a few useful R packages (e.g. ggplot, plyr )
The next part of the course aims to:
Take the idea of reproducible code to reproducible documents
Instead of working with R commands, work with an entire report
Report includes description of you problem, data and algorithm as well as embedded code and results
You can automatically “compile” the report, which will rerun your code, regenerate your results and form a new report
Allows collaborators to regenerate report on their computer
This is how we will be submitting homeworks
Another nice system for dynamics notebooks is Jupyter notebook
Formerly called ipython notebooks, is still python based, but now supports more languages:
I made these slides using Jupyter
You can try installing it if you 're prepared to deal with setting up python/python libraries