Tasks

3. Take a look at the last 5 tweets of WZB_Berlin

Of course, the following numbers won’t match yours, because were taken at a different time.

3.1. Create two vectors retweets and likes that contain the respective numbers from the last 5 tweets

retweets <- c(1, 3, 2, 2, 3)
likes <- c(6, 10, 9, 6, 3)

3.2. Create a third vector tweet_ids that contains the letters a to e as identifiers for the tweets

tweet_ids <- c('a', 'b', 'c', 'd', 'e')
# note: a shortcut for this would be tweet_ids <- letters[1:5]

3.3. Check the data type of all three vectors using the function class(...)

class(retweets)
## [1] "numeric"
class(likes)
## [1] "numeric"
class(tweet_ids)
## [1] "character"

3.4. Look at 5 more tweets, append the respective data to the vectors

retweets <- c(retweets, 4, 3, 2, 8, 2)
likes <- c(likes, 6, 6, 7, 6, 15)
tweet_ids <- c(tweet_ids, 'f', 'g', 'h', 'i', 'j')
# note: a shortcut for this would be tweet_ids <- c(tweet_ids, letters[6:10])

3.5. Create a dataframe tweetstats from the three vectors

tweetstats <- data.frame(tweet_ids, retweets, likes)
tweetstats
##    tweet_ids retweets likes
## 1          a        1     6
## 2          b        3    10
## 3          c        2     9
## 4          d        2     6
## 5          e        3     3
## 6          f        4     6
## 7          g        3     6
## 8          h        2     7
## 9          i        8     6
## 10         j        2    15

3.6. Add an additional variable/column to tweetstats named interactions which is the sum of retweets and likes for each observation

tweetstats$interactions <- tweetstats$retweets + tweetstats$likes
tweetstats
##    tweet_ids retweets likes interactions
## 1          a        1     6            7
## 2          b        3    10           13
## 3          c        2     9           11
## 4          d        2     6            8
## 5          e        3     3            6
## 6          f        4     6           10
## 7          g        3     6            9
##  [ reached getOption("max.print") -- omitted 3 rows ]

4. As in the previous session’s tasks, we’ll work with the cats dataset from the package MASS

4.1. Load the package and the dataset

library(MASS)
data(cats)
cats
##     Sex Bwt  Hwt
## 1     F 2.0  7.0
## 2     F 2.0  7.4
## 3     F 2.0  9.5
## 4     F 2.1  7.2
## 5     F 2.1  7.3
## 6     F 2.1  7.6
## 7     F 2.1  8.1
## 8     F 2.1  8.2
## 9     F 2.1  8.3
## 10    F 2.1  8.5
##  [ reached getOption("max.print") -- omitted 134 rows ]

4.2. How do you bring up the dataset documentation / help for the dataset?

?cats
# or: help(cats)

4.3. Identify the number of rows and columns in the dataset by using the respective R functions

nrow(cats)
## [1] 144
ncol(cats)
## [1] 3

4.4. Identify the column names using the respective R function

colnames(cats)
## [1] "Sex" "Bwt" "Hwt"

4.5. What are the data types of the columns in the dataset? Again, use class(...) to answer this question.

class(cats$Sex)
## [1] "factor"
class(cats$Bwt)
## [1] "numeric"
class(cats$Hwt)
## [1] "numeric"

4.6. What if you recorded two more variables: Age and whether the cat has heart problems. Which data types would you choose for each variable?

A variable recording age should be a numeric or integer variable. A variable that records whether a cat has heart problems should be logical (TRUE/FALSE) variable or factor variable with two levels (yes/no).

4.7. Create a new column wt_ratio which is the ratio of heart and body weight. Make sure to bring both variables to a common unit of measurement (i.e. both in grams or kilograms)

# multiply body weight by 1000 to get grams (heart weight is also measured in grams)
cats$wt_ratio <- cats$Hwt / (cats$Bwt * 1000)
cats
##     Sex Bwt  Hwt    wt_ratio
## 1     F 2.0  7.0 0.003500000
## 2     F 2.0  7.4 0.003700000
## 3     F 2.0  9.5 0.004750000
## 4     F 2.1  7.2 0.003428571
## 5     F 2.1  7.3 0.003476190
## 6     F 2.1  7.6 0.003619048
## 7     F 2.1  8.1 0.003857143
##  [ reached getOption("max.print") -- omitted 137 rows ]