Either use RStudio’s package manager (“Packages” tab in the lower left pane) or issue the following command on the console:
install.packages('MASS')
Let’s load the packages:
library(MASS)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.7
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
Loading these packages will produce some messages on the console. What do you think do they mean? (If you don’t know, just make a wild guess!)
The first section informs about which (sub-)packages from the “meta-package” tidyverse were loaded. The second section about “Conflicts” informs which functions are masked by the tidyverse. This means there are some functions from other packages, that happen to have the same name. For example, the last line from above informs us, that a package named dplyr from the tidyverse contains a function select
which conflicts with the similarily named function from the package MASS.
So if we would use the function select()
now (you don’t need to know what it does for now), which one will be used – the one from dplyr or the one from MASS? This depends on the order of package import, i.e. the order of the above library(...)
commands. The last imported package always has precedence, so in this case, select()
from dplyr would be used. This process is called masking.
You can still access the masked function by using a namespace prefix. This means you prepend the package name and double colons to the function name like this: MASS::select()
to specify which function from which package you want to use in case of a conflict.
data(cats)
to load the data)data(cats)
?cats
or:
help(cats)
The variables in the dataset are:
Issue simply the command cats
at the console – What generally happens when you simply use an object’s name as command?
cats
## Sex Bwt Hwt
## 1 F 2.0 7.0
## 2 F 2.0 7.4
## 3 F 2.0 9.5
## 4 F 2.1 7.2
## 5 F 2.1 7.3
## 6 F 2.1 7.6
## 7 F 2.1 8.1
## 8 F 2.1 8.2
## 9 F 2.1 8.3
## 10 F 2.1 8.5
## [ reached getOption("max.print") -- omitted 134 rows ]
Using the functions head
and tail
. Inform yourself beforehand what these functions do (again, using the help system).
head(x, n)
and tail(x, n)
return only the first or last n
rows in a dataset x
. Be default, n
is 6.
head(cats)
## Sex Bwt Hwt
## 1 F 2.0 7.0
## 2 F 2.0 7.4
## 3 F 2.0 9.5
## 4 F 2.1 7.2
## 5 F 2.1 7.3
## 6 F 2.1 7.6
tail(cats)
## Sex Bwt Hwt
## 139 M 3.6 15.0
## 140 M 3.7 11.0
## 141 M 3.8 14.8
## 142 M 3.8 16.8
## 143 M 3.9 14.4
## 144 M 3.9 20.5
Using RStudios View
function (use the function from the console and also check out the small table icon in the “Environment” tab in the top right pane)
This will open a new tab in the main window pane with a data viewer.
View(cats)
qplot
from the ggplot2 package (incl. in tidyverse)Plot Bwt
(body weight) on the x-axis and Hwt
(heart weight) on the y-axis
qplot
is short for “quick plot”. The following specifies a scatter plot, where the first parameter is the variable on the x-axis, the second is the variable on the y-axis and the third parameter specifies the data to plot.
qplot(Bwt, Hwt, data = cats)