October 25, 2018
source: Elith et al. 2008
Thematically, the tutorial sessions do not run in parallel with the seminar in Potsdam.
– Christmas and New Years Eve break –
Possible topics include:
You'll learn modern R to do:
This is not a statistics course.
→ we'll focus is on data preparation, EDA and visualization
For statistics with R see:
This is not an in-depth programming course.
→ we'll write short scripts and learn some fundamental concepts of programming
For programming with R see:
increasingly popular, esp. in the science community
source: StackOverflow
"Base R" or "R Core": Core functions of the R language without additional packages
with(airquality, sapply(split(Ozone, Month), mean, na.rm = TRUE))
source: tidyverse.org
tidyverse: set of packages that share the same "design philosophy, grammar, and data structures"
airquality %>% group_by(Month) %>% summarize(m_oz = mean(Ozone, na.rm = TRUE))
Base R:
with(airquality, sapply(split(Ozone, Month), mean, na.rm = TRUE))
tidyverse:
airquality %>% group_by(Month) %>% summarize(m_oz = mean(Ozone, na.rm = TRUE))
R's functionality can be extended by packages which are available in the Comprehensive R Archive Network (CRAN).
Popular packages include:
Only for WZB staff:
For those who can't / don't want to install RStudio on their computer there's an option to use RStudio via the browser:
Use your WZB login there.
Some general rules:
1. Each line is a statement ("command"), several statements are evaluated from top to bottom.
c <- a + b d <- sqrt(c)
Exception: If an expression is not closed (see paranthesis rule below), it can span several lines:
a * (b + c + d)
This is the same as a * (b + c + d)
.
2. Spaces are generally ignored.
These are all equivalent:
a+b a + b a + b
Use spaces and indents to make your code more readable.
3. Expressions must be closed.
There are different special characters, that mark the beginning and end of something, e.g. the beginning and end of a character string or an expression:
"hello world" a * (b + c) x[1]
More complex statements contain nested expressions. Nested expressions are evaluated from inner to outer.
y[c(1, 3)]
For each opened paranthesis, quotation mark, etc. there must be a closing counterpart in the correct order. This would be wrong:
y[c(1, 3]) ## Error: unexpected ']'
4. Comma and dots
Commas split things: Mainly arguments (parameters) of functions.
log(x, 5)
→ passes the parameters x
and 5
to compute the base 5 logarithm of x
.
Comma cannot be used to group digits in large numbers:
population <- 3,350,000 ## Error: unexpected ',' in "population <- 3,"
A dot is used as decimal point:
3.1415
alternative: use command on Console:
install.packages("<PACKAGE_NAME>")
then, to load a package:
library(<PACKAGE_NAME>)
(without quotation marks!)
install.packages("tidyverse") library(tidyverse)
If you forget to load a package, you will be confronted with errors like these:
qplot() ## Error in qplot() : could not find function "qplot" diamonds ## Error: object 'diamonds' not found
getwd()
setting the working path: setwd("<PATH>")
/
(MacOS / Unix) or C:\
getwd()
returns "/Users/NoName/Documents"
/Users/NoName/Documents/MyProject/data.csv
read.csv("MyProject/data.csv")
/Users/NoName/Documents/MyProject
?/Users/NoName/Research
?_
instead)
Using R's internal help system:
help(<SYMBOL>)
/ shortcut: ?<SYMBOL>
<SYMBOL>
can be anything: a function, a package, a data set?getwd
or ?mean
example(<SYMBOL>)
example(mean) ## mean> x <- c(0:10, 50) ## mean> xm <- mean(x) ## mean> c(xm, mean(x, trim = 0.10)) ## [1] 8.75 5.50
- list all available functions containing a keyword: apropos("<SEARCH>")
apropos('matrix') ## [1] "anyDuplicated.matrix" "as.data.frame.matrix" ... ## [4] "as.matrix" "as.matrix.data.frame" ...
Vignettes provide a short introduction to a specific package, function or topic. Not all packages offer a vignette.
vignette()
shows all available vignettesvignette('<TOPIC>')
openes a vignette for a specific topic (e.g. vignette('dplyr')
→ introduction to the dplyr package in the help viewer)
"cran <PACKAGE>"
)
many packages have own websites / online documentation, especially the tidyverse packages (tidyverse.org)
but it's worth the effort!
you need to be exact
BUT: better don't try to learn more than one programming language at once
If you encounter an error:
Web search query patterns:
"r <PACKAGE> <PROBLEM>"
"r <PROBLEM>"
Reduce error messages to the general problem:
summarize(airquality, m_oz = mean(SolarR)) ## Error in summarise_impl(.data, dots): Evaluation error: ## object 'SolarR' not found.
→ possible search query: "r dplyr summarize object not found"
Example 2:
mean(airquality$Ozone) ## [1] NA
→ possible search query: "r mean always returns NA"
Example 3:
Sometimes, error messages provide hints:
filter(airquality, Month = 7) ## Error: `Month` (`Month = 7`) must not be named, do you need `==`?
source: attackofthecute.com
data(cats)
to load the data)cats
at the console – What generally happens when you simply use an object's name as command?head
and tail
.View
function (use the function from the console and also check out the small table icon in the "Environment" tab in the top right pane)qplot
from the ggplot2 package (incl. in tidyverse)
qplot
in R's help systemBwt
(body weight) on the x-axis and Hwt
(heart weight) on the y-axisqplot(<VARIABLE ON X>, <VARIABLE ON Y>, data = <DATASET>)