Tasks

1. Consider the command mean(head(sort(airquality$Ozone), 10)). Describe in one sentence what the effect of this command is.

This command calculates the mean of the lowest 10 values in the Ozone vector of the data set airquality.

2. Use the same command as above, but reverse the ordering. How do you do that and what’s the effect of it?

mean(head(sort(airquality$Ozone, decreasing = TRUE), 10))
## [1] 116.6

This calculates the mean of the highest 10 values in the Ozone vector of the data set airquality.

3. Calculate the standard deviation of the variable Solar.R in the built-in data set airquality. Use the function sd() for this and set it to ignore NA values in the input vector.

sd(airquality$Solar.R, na.rm = TRUE)
## [1] 90.05842

4. rnorm() is a function to generate random numbers from a normal distribution. Find out which parameters it has using R’s built-in help function. Now generate 100 random numbers from a normal distribution with mean 30 and standard deviation 2. Calculate the mean from these numbers. How much differs the mean of your generated numbers from the mean 30?

Showing the help page of rnorm():

?rnorm

Note: The following results will be different from your results, because the random number generator will output different values for you. What’s important is the code, not the output.

(nums <- rnorm(100, mean = 30, sd = 2))   # you don't have to use named arguments
##  [1] 28.01128 27.29290 30.18439 31.83224 27.66160 30.84426 31.21104
##  [8] 28.89143 24.28387 31.46757 29.85354 30.04740 32.47488 30.65084
## [15] 29.97073 24.57966 27.21027 32.28486 31.46926 29.43507 27.26833
## [22] 30.84233 29.96708 27.80542 25.21708 25.14895 28.31660 31.72991
## [29] 26.55496 30.65358
##  [ reached getOption("max.print") -- omitted 70 entries ]
(mean_nums <- mean(nums))
## [1] 29.71554
abs(30 - mean_nums)
## [1] 0.2844564

This is the difference between the mean of the generated numbers and the specified mean of the random number generator.

5. Use the 100 random numbers generated in the previous task and count how many of them are greater than or equal 30.

sum(nums >= 30)
## [1] 51

6. Load the package MASS and its data set cats as in the previous session. Answer the following questions.

library(MASS)
data(cats)

head(cats)
##   Sex Bwt Hwt
## 1   F 2.0 7.0
## 2   F 2.0 7.4
## 3   F 2.0 9.5
## 4   F 2.1 7.2
## 5   F 2.1 7.3
## 6   F 2.1 7.6

1. How many female, how many male cats are there in the data set? Store the results in two objects n_female and n_male.

(n_female <- sum(cats$Sex == 'F'))
## [1] 47
(n_male <- sum(cats$Sex == 'M'))
## [1] 97

2. How many female cats have a body weight of at least 2.5kg? What’s the ratio of these in the group of all female cats?

(n_bigcats_f <- sum(cats$Sex == 'F' & cats$Bwt >= 2.5))
## [1] 13
n_bigcats_f / n_female
## [1] 0.2765957

3. How many male cats have a body weight of at least 2.5kg? What’s the ratio of these in the group of all male cats?

(n_bigcats_m <- sum(cats$Sex == 'M' & cats$Bwt >= 2.5))
## [1] 80
n_bigcats_m / n_male
## [1] 0.8247423

4. How many female cats have a body weight of at least 2.5kg or a heart weight of 10g and more? What’s the ratio of these in the group of all female cats?

(n_bigcats_hwt_f <- sum(cats$Sex == 'F' & (cats$Bwt >= 2.5 | cats$Hwt >= 10)))
## [1] 20
n_bigcats_hwt_f / n_female
## [1] 0.4255319

5. How many male cats have a body weight of at least 2.5kg or a heart weight of 10g and more? What’s the ratio of these in the group of all male cats?

(n_bigcats_hwt_m <- sum(cats$Sex == 'M' & (cats$Bwt >= 2.5 | cats$Hwt >= 10)))
## [1] 82
n_bigcats_hwt_m / n_male
## [1] 0.8453608

8. What’s wrong with the following lines of code:

Example 1:

sum(airquality$Month = 5)
## Error

The author of this code probably wanted to use the equality operator == to count how many rows in the airquality data set have a Month set to 5. So the correct code would be:

sum(airquality$Month == 5)
## [1] 31

Example 2:

smoker <- c(TRUE, NA, FALSE, TRUE, FALSE)
sum(smoker, na.rm <- TRUE)
## Error

The author of this code probably wanted to set the na.rm parameter to TRUE but used the wrong operator for it. <- is the object assignment operator, but = must be used to pass arguments to a function:

smoker <- c(TRUE, NA, FALSE, TRUE, FALSE)
sum(smoker, na.rm = TRUE)
## [1] 2

Example 3:

age <- c(20, NA, 19, 51, 20)
mean(age, na.rm == TRUE)
## Error

Again, the author of this code probably wanted to set the na.rm parameter to TRUE but used the wrong operator for it. == is the equality comparison operator, but = must be used to pass arguments to a function:

age <- c(20, NA, 19, 51, 20)
mean(age, na.rm = TRUE)
## [1] 27.5

Example 4:

country <- factor(c('USA', 'GB', 'GB', 'DE', 'USA'))
country_is_usa <- (country = 'USA')
country_is_usa
## [1] "USA"

Note that this code does not produce an error! It is syntactically correct. However, the author of this code probably wanted to make an equality comparison to get the elements of country that equal the value 'USA'. Unfortunately, the wrong operator was used. = is the alternative object assignment operator (like <- but don’t use it!) and == is the equality comparison operator that must be used instead:

country <- factor(c('USA', 'GB', 'GB', 'DE', 'USA'))
country_is_usa <- (country == 'USA')   # you can omit the parantheses
country_is_usa
## [1]  TRUE FALSE FALSE FALSE  TRUE

Example 5:

age <- c(20, NA, 19, 51, 20)
median(age, rm.na = TRUE)
## [1] NA

Note that this code does not produce an error! It is syntactically correct. However, the author of this code probably wanted to remove NAs before calculating the median. Unfortunately, the argument to remove NAs was not correctly specified. Instead of rm.na it should be named na.rm.