Tasks

1. Indexing

There are two predefined objects in R that contain all letters from A-Z and a-z, respectively:

LETTERS
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
letters
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

Using numeric indexing (not subsetting with logical expressions), try to generate the following output using either letters or LETTERS:

  1. The single letter "e"
  2. All letters but "e"
  3. The last five letters in the alphabet ("v" "w" "x" "y" "z")
  4. The 23th, 26th and second capital letters (in that order) forming "W" "Z" "B"
  5. Every second letter starting from 1 ("a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y") Hint: Use the seq() function
  6. All but the first five letters: "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
  7. Create an object myletters as a copy of letters (myletters <- letters). Assign the first five capital letters (from LETTERS) to the first five letters of myletters so that myletters will then contain: "A" "B" "C" "D" "E" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

2. Subsetting with logical expressions

  1. Complete lesson 6 (“Subsetting vectors”) of SWIRL Course “R Programming”. (See the notes in session 2 tasks about installing SWIRL if you have not done that yet.)
  2. Consider the following data and subset it according to the mentioned criteria below:
retweets <- c(1, 3, 2, 2, 3, 4, 3, 2, 8, 2)
likes <- c(6, 10, 9, 6, 3, 6, 6, 7, 6, 15)
users <- factor(c('WZB_Berlin', 'JWI_Berlin', 'JWI_Berlin', 'gesis_org', 'WZB_Berlin', 'WZB_Berlin', 'WZB_Berlin', 'gesis_org', 'JWI_Berlin', 'WZB_Berlin'))
located_in_berlin <- c(TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE)

Assume that the elements in the vectors are aligned, i.e. the first element in retweets corresponds to the first element in likes and users etc. (as if they were combined in a data frame). Solve all tasks by using logical expressions / logical vectors.

    1. Form subsets of the vectors retweets and likes to contain only data from the user WZB_Berlin.
    1. Form a subset of the vector users to contain only elements where located_in_berlin is FALSE or users equals "WZB_Berlin" (this should return a vector only containing "gesis_org" and "WZB_Berlin").
    1. Form subsets of the vectors retweets, likes and users with the criteria to have at least three retweets and at least six likes. (Hint: If you want to spare yourself from typing too much, create a logical vector of the criteria at first and re-use it to subset the vectors.)
    1. Calculate the median of retweets. Now form a subset of retweets, users and located_in_berlin where retweets are higher than the median.

3. Reading and writing files / subsetting data frames

Create a script file in RStudio that does the following:

  1. It loads the CSV file segindex_sample.csv (from the accompanying resources file 04rbasics3-resources.zip available on the course website) into a data frame. Set read.csv() to not convert strings to factors automatically.
  2. It filters this data frame by selecting only observations from the states “NRW”, “RP” and “BW” (Hint: You can use the %in% operator for this – it was introduced in the previous session).
  3. It saves the filtered data frame to an Excel file segindex_subset.xlsx.