mean(head(sort(airquality$Ozone), 10))
. Describe in one sentence what the effect of this command is.This command calculates the mean of the lowest 10 values in the Ozone
vector of the data set airquality
.
mean(head(sort(airquality$Ozone, decreasing = TRUE), 10))
## [1] 116.6
This calculates the mean of the highest 10 values in the Ozone
vector of the data set airquality
.
Solar.R
in the built-in data set airquality
. Use the function sd()
for this and set it to ignore NA
values in the input vector.sd(airquality$Solar.R, na.rm = TRUE)
## [1] 90.05842
rnorm()
is a function to generate random numbers from a normal distribution. Find out which parameters it has using R’s built-in help function. Now generate 100 random numbers from a normal distribution with mean 30 and standard deviation 2. Calculate the mean from these numbers. How much differs the mean of your generated numbers from the mean 30?Showing the help page of rnorm()
:
?rnorm
Note: The following results will be different from your results, because the random number generator will output different values for you. What’s important is the code, not the output.
(nums <- rnorm(100, mean = 30, sd = 2)) # you don't have to use named arguments
## [1] 28.01128 27.29290 30.18439 31.83224 27.66160 30.84426 31.21104
## [8] 28.89143 24.28387 31.46757 29.85354 30.04740 32.47488 30.65084
## [15] 29.97073 24.57966 27.21027 32.28486 31.46926 29.43507 27.26833
## [22] 30.84233 29.96708 27.80542 25.21708 25.14895 28.31660 31.72991
## [29] 26.55496 30.65358
## [ reached getOption("max.print") -- omitted 70 entries ]
(mean_nums <- mean(nums))
## [1] 29.71554
abs(30 - mean_nums)
## [1] 0.2844564
This is the difference between the mean of the generated numbers and the specified mean of the random number generator.
sum(nums >= 30)
## [1] 51
MASS
and its data set cats
as in the previous session. Answer the following questions.library(MASS)
data(cats)
head(cats)
## Sex Bwt Hwt
## 1 F 2.0 7.0
## 2 F 2.0 7.4
## 3 F 2.0 9.5
## 4 F 2.1 7.2
## 5 F 2.1 7.3
## 6 F 2.1 7.6
n_female
and n_male
.(n_female <- sum(cats$Sex == 'F'))
## [1] 47
(n_male <- sum(cats$Sex == 'M'))
## [1] 97
(n_bigcats_f <- sum(cats$Sex == 'F' & cats$Bwt >= 2.5))
## [1] 13
n_bigcats_f / n_female
## [1] 0.2765957
(n_bigcats_m <- sum(cats$Sex == 'M' & cats$Bwt >= 2.5))
## [1] 80
n_bigcats_m / n_male
## [1] 0.8247423
(n_bigcats_hwt_f <- sum(cats$Sex == 'F' & (cats$Bwt >= 2.5 | cats$Hwt >= 10)))
## [1] 20
n_bigcats_hwt_f / n_female
## [1] 0.4255319
(n_bigcats_hwt_m <- sum(cats$Sex == 'M' & (cats$Bwt >= 2.5 | cats$Hwt >= 10)))
## [1] 82
n_bigcats_hwt_m / n_male
## [1] 0.8453608
Example 1:
sum(airquality$Month = 5)
## Error
The author of this code probably wanted to use the equality operator ==
to count how many rows in the airquality
data set have a Month
set to 5
. So the correct code would be:
sum(airquality$Month == 5)
## [1] 31
Example 2:
smoker <- c(TRUE, NA, FALSE, TRUE, FALSE)
sum(smoker, na.rm <- TRUE)
## Error
The author of this code probably wanted to set the na.rm
parameter to TRUE
but used the wrong operator for it. <-
is the object assignment operator, but =
must be used to pass arguments to a function:
smoker <- c(TRUE, NA, FALSE, TRUE, FALSE)
sum(smoker, na.rm = TRUE)
## [1] 2
Example 3:
age <- c(20, NA, 19, 51, 20)
mean(age, na.rm == TRUE)
## Error
Again, the author of this code probably wanted to set the na.rm
parameter to TRUE
but used the wrong operator for it. ==
is the equality comparison operator, but =
must be used to pass arguments to a function:
age <- c(20, NA, 19, 51, 20)
mean(age, na.rm = TRUE)
## [1] 27.5
Example 4:
country <- factor(c('USA', 'GB', 'GB', 'DE', 'USA'))
country_is_usa <- (country = 'USA')
country_is_usa
## [1] "USA"
Note that this code does not produce an error! It is syntactically correct. However, the author of this code probably wanted to make an equality comparison to get the elements of country
that equal the value 'USA'
. Unfortunately, the wrong operator was used. =
is the alternative object assignment operator (like <-
but don’t use it!) and ==
is the equality comparison operator that must be used instead:
country <- factor(c('USA', 'GB', 'GB', 'DE', 'USA'))
country_is_usa <- (country == 'USA') # you can omit the parantheses
country_is_usa
## [1] TRUE FALSE FALSE FALSE TRUE
Example 5:
age <- c(20, NA, 19, 51, 20)
median(age, rm.na = TRUE)
## [1] NA
Note that this code does not produce an error! It is syntactically correct. However, the author of this code probably wanted to remove NAs before calculating the median. Unfortunately, the argument to remove NAs was not correctly specified. Instead of rm.na
it should be named na.rm
.