Tasks

1. Install the package MASS (test on Windows)

Either use RStudio’s package manager (“Packages” tab in the lower left pane) or issue the following command on the console:

install.packages('MASS')

2. Load the packages MASS and tidyverse

Let’s load the packages:

library(MASS)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::select() masks MASS::select()

Loading these packages will produce some messages on the console. What do you think do they mean? (If you don’t know, just make a wild guess!)

The first section informs about which (sub-)packages from the “meta-package” tidyverse were loaded. The second section about “Conflicts” informs which functions are masked by the tidyverse. This means there are some functions from other packages, that happen to have the same name. For example, the last line from above informs us, that a package named dplyr from the tidyverse contains a function select which conflicts with the similarily named function from the package MASS.

So if we would use the function select() now (you don’t need to know what it does for now), which one will be used – the one from dplyr or the one from MASS? This depends on the order of package import, i.e. the order of the above library(...) commands. The last imported package always has precedence, so in this case, select() from dplyr would be used. This process is called masking.

You can still access the masked function by using a namespace prefix. This means you prepend the package name and double colons to the function name like this: MASS::select() to specify which function from which package you want to use in case of a conflict.

3. Load the builtin dataset “cats” provided by the package MASS (Hint: Run data(cats) to load the data)

data(cats)

4. Inform yourself about the data using R’s help system – What are the variables in the dataset?

?cats

or:

help(cats)

The variables in the dataset are:

  • sex: Factor with levels “F” and “M”.
  • Bwt: body weight in kg.
  • Hwt: heart weight in g.

5. View the data using 4 different perspectives

Issue simply the command cats at the console – What generally happens when you simply use an object’s name as command?

cats
##     Sex Bwt  Hwt
## 1     F 2.0  7.0
## 2     F 2.0  7.4
## 3     F 2.0  9.5
## 4     F 2.1  7.2
## 5     F 2.1  7.3
## 6     F 2.1  7.6
## 7     F 2.1  8.1
## 8     F 2.1  8.2
## 9     F 2.1  8.3
## 10    F 2.1  8.5
##  [ reached getOption("max.print") -- omitted 134 rows ]

Using the functions head and tail. Inform yourself beforehand what these functions do (again, using the help system).

head(x, n) and tail(x, n) return only the first or last n rows in a dataset x. Be default, n is 6.

head(cats)
##   Sex Bwt Hwt
## 1   F 2.0 7.0
## 2   F 2.0 7.4
## 3   F 2.0 9.5
## 4   F 2.1 7.2
## 5   F 2.1 7.3
## 6   F 2.1 7.6
tail(cats)
##     Sex Bwt  Hwt
## 139   M 3.6 15.0
## 140   M 3.7 11.0
## 141   M 3.8 14.8
## 142   M 3.8 16.8
## 143   M 3.9 14.4
## 144   M 3.9 20.5

Using RStudios View function (use the function from the console and also check out the small table icon in the “Environment” tab in the top right pane)

This will open a new tab in the main window pane with a data viewer.

View(cats)

6. Construct a scatter plot of the data using qplot from the ggplot2 package (incl. in tidyverse)

Plot Bwt (body weight) on the x-axis and Hwt (heart weight) on the y-axis

qplot is short for “quick plot”. The following specifies a scatter plot, where the first parameter is the variable on the x-axis, the second is the variable on the y-axis and the third parameter specifies the data to plot.

qplot(Bwt, Hwt, data = cats)