There are two predefined objects in R that contain all letters from A-Z and a-z, respectively:
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
Using numeric indexing (not subsetting with logical expressions), try to generate the following output using either letters
or LETTERS
:
# 1. The single letter `"e"`
letters[5]
## [1] "e"
# 2. All letters but `"e"`
letters[-5]
## [1] "a" "b" "c" "d" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
## [18] "s" "t" "u" "v" "w" "x" "y" "z"
# 3. The last five letters in the alphabet (`"v" "w" "x" "y" "z"`)
letters[22:26]
## [1] "v" "w" "x" "y" "z"
# 4. The 23th, 26th and second capital letters (in that order) forming `"W" "Z" "B"`
LETTERS[c(23, 26, 2)]
## [1] "W" "Z" "B"
# 5. Every second letter starting from 1 (`"a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y"`)
letters[seq(1, 26, by = 2)]
## [1] "a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y"
# 6. All but the first five letters: `"f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"`
letters[-(1:5)]
## [1] "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v"
## [18] "w" "x" "y" "z"
# 7. Create an object `myletters` as a copy of `letters` (`myletters <- letters`).
# Assign the first five capital letters (from `LETTERS`) to the first five letters
# of `myletters` so that `myletters` will then contain:
# `"A" "B" "C" "D" "E" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"`
myletters <- letters
myletters[1:5] <- LETTERS[1:5]
myletters
## [1] "A" "B" "C" "D" "E" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
retweets <- c(1, 3, 2, 2, 3, 4, 3, 2, 8, 2)
likes <- c(6, 10, 9, 6, 3, 6, 6, 7, 6, 15)
users <- factor(c('WZB_Berlin', 'JWI_Berlin', 'JWI_Berlin', 'gesis_org', 'WZB_Berlin', 'WZB_Berlin', 'WZB_Berlin', 'gesis_org', 'JWI_Berlin', 'WZB_Berlin'))
located_in_berlin <- c(TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE)
Assume that the elements in the vectors are aligned, i.e. the first element in retweets
corresponds to the first element in likes
and users
etc. (as if they were combined in a data frame). Solve all tasks by using logical expressions / logical vectors.
retweets
and likes
to contain only data from the user WZB_Berlin
.retweets[users == 'WZB_Berlin']
## [1] 1 3 4 3 2
likes[users == 'WZB_Berlin']
## [1] 6 3 6 6 15
users
to contain only elements where located_in_berlin
is FALSE
or users
equals "WZB_Berlin"
(this should return a vector only containing "gesis_org"
and "WZB_Berlin"
).users[!located_in_berlin | users == 'WZB_Berlin']
## [1] WZB_Berlin gesis_org WZB_Berlin WZB_Berlin WZB_Berlin gesis_org
## [7] WZB_Berlin
## Levels: gesis_org JWI_Berlin WZB_Berlin
# this is also correct:
users[located_in_berlin == FALSE | users == 'WZB_Berlin']
## [1] WZB_Berlin gesis_org WZB_Berlin WZB_Berlin WZB_Berlin gesis_org
## [7] WZB_Berlin
## Levels: gesis_org JWI_Berlin WZB_Berlin
retweets
, likes
and users
with the criteria to have at least three retweets and at least six likes.To save us some typing, I create a logical vector first:
(criteria <- retweets >= 3 & likes >= 6)
## [1] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
And now I can use it with all the vectors:
retweets[criteria]
## [1] 3 4 3 8
likes[criteria]
## [1] 10 6 6 6
users[criteria]
## [1] JWI_Berlin WZB_Berlin WZB_Berlin JWI_Berlin
## Levels: gesis_org JWI_Berlin WZB_Berlin
retweets
. Now form a subset of retweets
, users
and located_in_berlin
where retweets are higher than the median.This is the median for retweets
:
(med_retw <- median(retweets))
## [1] 2.5
Again, we create a logical vector first:
(retw_above_median <- retweets > med_retw)
## [1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE
retweets[retw_above_median]
## [1] 3 3 4 3 8
users[retw_above_median]
## [1] JWI_Berlin WZB_Berlin WZB_Berlin WZB_Berlin JWI_Berlin
## Levels: gesis_org JWI_Berlin WZB_Berlin
located_in_berlin[retw_above_median]
## [1] TRUE TRUE TRUE TRUE TRUE
Create a script file in RStudio that does the following:
segindex_sample.csv
(from the accompanying resources file 04rbasics3-resources.zip
available on the course website) into a data frame. Set read.csv()
to not convert strings to factors automatically.%in%
operator for this – it was introduced in the previous session).segindex_subset.xlsx
.You should write this to a R script (with the file name extension .R
), but I copied the contents of a solution here. Please note that the paths to the files for reading/writing can be different in your case. For this solution, I assume that the R script resides in the same directory as the CSV input and Excel output files and the working directory is the same.
library(writexl)
# step 1:
segindex <- read.csv('segindex_sample.csv', stringsAsFactors = FALSE)
# step 2:
segindex_subset <- segindex[segindex$state %in% c('NRW', 'RP', 'BW'),]
# step 3:
write_xlsx(segindex_subset, 'segindex_subset.xlsx')