R Tutorial at the WZB

Tasks

3. Take a look at the last 5 tweets of WZB_Berlin

Of course, the following numbers won’t match yours, because were taken at a different time.

3.1. Create two vectors `retweets` and `likes` that contain the respective numbers from the last 5 tweets

retweets <- c(1, 3, 2, 2, 3)
likes <- c(6, 10, 9, 6, 3)

3.2. Create a third vector `tweet_ids` that contains the letters a to e as identifiers for the tweets

tweet_ids <- c('a', 'b', 'c', 'd', 'e')
# note: a shortcut for this would be tweet_ids <- letters[1:5]

3.3. Check the data type of all three vectors using the function `class(...)`

class(retweets)

## [1] "numeric"

class(likes)

## [1] "numeric"

class(tweet_ids)

## [1] "character"

3.4. Look at 5 more tweets, append the respective data to the vectors

retweets <- c(retweets, 4, 3, 2, 8, 2)
likes <- c(likes, 6, 6, 7, 6, 15)
tweet_ids <- c(tweet_ids, 'f', 'g', 'h', 'i', 'j')
# note: a shortcut for this would be tweet_ids <- c(tweet_ids, letters[6:10])

3.5. Create a dataframe `tweetstats` from the three vectors

tweetstats <- data.frame(tweet_ids, retweets, likes)
tweetstats

##    tweet_ids retweets likes
## 1          a        1     6
## 2          b        3    10
## 3          c        2     9
## 4          d        2     6
## 5          e        3     3
## 6          f        4     6
## 7          g        3     6
## 8          h        2     7
## 9          i        8     6
## 10         j        2    15

3.6. Add an additional variable/column to `tweetstats` named `interactions` which is the sum of retweets and likes for each observation

tweetstats$interactions <- tweetstats$retweets + tweetstats$likes
tweetstats

##    tweet_ids retweets likes interactions
## 1          a        1     6            7
## 2          b        3    10           13
## 3          c        2     9           11
## 4          d        2     6            8
## 5          e        3     3            6
## 6          f        4     6           10
## 7          g        3     6            9
##  [ reached getOption("max.print") -- omitted 3 rows ]

4. As in the previous session’s tasks, we’ll work with the `cats` dataset from the package MASS

4.1. Load the package and the dataset

library(MASS)
data(cats)
cats

##     Sex Bwt  Hwt
## 1     F 2.0  7.0
## 2     F 2.0  7.4
## 3     F 2.0  9.5
## 4     F 2.1  7.2
## 5     F 2.1  7.3
## 6     F 2.1  7.6
## 7     F 2.1  8.1
## 8     F 2.1  8.2
## 9     F 2.1  8.3
## 10    F 2.1  8.5
##  [ reached getOption("max.print") -- omitted 134 rows ]

4.2. How do you bring up the dataset documentation / help for the dataset?

?cats
# or: help(cats)

4.3. Identify the number of rows and columns in the dataset by using the respective R functions

nrow(cats)

## [1] 144

ncol(cats)

## [1] 3

4.4. Identify the column names using the respective R function

colnames(cats)

## [1] "Sex" "Bwt" "Hwt"

4.5. What are the data types of the columns in the dataset? Again, use `class(...)` to answer this question.

class(cats$Sex)

## [1] "factor"

class(cats$Bwt)

## [1] "numeric"

class(cats$Hwt)

## [1] "numeric"

4.6. What if you recorded two more variables: Age and whether the cat has heart problems. Which data types would you choose for each variable?

A variable recording age should be a numeric or integer variable. A variable that records whether a cat has heart problems should be logical (TRUE/FALSE) variable or factor variable with two levels (yes/no).

4.7. Create a new column `wt_ratio` which is the ratio of heart and body weight. Make sure to bring both variables to a common unit of measurement (i.e. both in grams or kilograms)

# multiply body weight by 1000 to get grams (heart weight is also measured in grams)
cats$wt_ratio <- cats$Hwt / (cats$Bwt * 1000)
cats

##     Sex Bwt  Hwt    wt_ratio
## 1     F 2.0  7.0 0.003500000
## 2     F 2.0  7.4 0.003700000
## 3     F 2.0  9.5 0.004750000
## 4     F 2.1  7.2 0.003428571
## 5     F 2.1  7.3 0.003476190
## 6     F 2.1  7.6 0.003619048
## 7     F 2.1  8.1 0.003857143
##  [ reached getOption("max.print") -- omitted 137 rows ]

R Tutorial at the WZB

Tasks for session 2 - R Basics I

Markus Konrad

November 01, 2018

Tasks

3. Take a look at the last 5 tweets of WZB_Berlin

3.1. Create two vectors `retweets` and `likes` that contain the respective numbers from the last 5 tweets

3.2. Create a third vector `tweet_ids` that contains the letters a to e as identifiers for the tweets

3.3. Check the data type of all three vectors using the function `class(...)`

3.4. Look at 5 more tweets, append the respective data to the vectors

3.5. Create a dataframe `tweetstats` from the three vectors

3.6. Add an additional variable/column to `tweetstats` named `interactions` which is the sum of retweets and likes for each observation

4. As in the previous session’s tasks, we’ll work with the `cats` dataset from the package MASS

4.1. Load the package and the dataset

4.2. How do you bring up the dataset documentation / help for the dataset?

4.3. Identify the number of rows and columns in the dataset by using the respective R functions

4.4. Identify the column names using the respective R function

4.5. What are the data types of the columns in the dataset? Again, use `class(...)` to answer this question.

4.6. What if you recorded two more variables: Age and whether the cat has heart problems. Which data types would you choose for each variable?

4.7. Create a new column `wt_ratio` which is the ratio of heart and body weight. Make sure to bring both variables to a common unit of measurement (i.e. both in grams or kilograms)

R Tutorial at the WZB

Tasks for session 2 - R Basics I

Markus Konrad

November 01, 2018

Tasks

3. Take a look at the last 5 tweets of WZB_Berlin

3.1. Create two vectors retweets and likes that contain the respective numbers from the last 5 tweets

3.2. Create a third vector tweet_ids that contains the letters a to e as identifiers for the tweets

3.3. Check the data type of all three vectors using the function class(...)

3.4. Look at 5 more tweets, append the respective data to the vectors

3.5. Create a dataframe tweetstats from the three vectors

3.6. Add an additional variable/column to tweetstats named interactions which is the sum of retweets and likes for each observation

4. As in the previous session’s tasks, we’ll work with the cats dataset from the package MASS

4.1. Load the package and the dataset

4.2. How do you bring up the dataset documentation / help for the dataset?

4.3. Identify the number of rows and columns in the dataset by using the respective R functions

4.4. Identify the column names using the respective R function

4.5. What are the data types of the columns in the dataset? Again, use class(...) to answer this question.

4.6. What if you recorded two more variables: Age and whether the cat has heart problems. Which data types would you choose for each variable?

4.7. Create a new column wt_ratio which is the ratio of heart and body weight. Make sure to bring both variables to a common unit of measurement (i.e. both in grams or kilograms)

3.1. Create two vectors `retweets` and `likes` that contain the respective numbers from the last 5 tweets

3.2. Create a third vector `tweet_ids` that contains the letters a to e as identifiers for the tweets

3.3. Check the data type of all three vectors using the function `class(...)`

3.5. Create a dataframe `tweetstats` from the three vectors

3.6. Add an additional variable/column to `tweetstats` named `interactions` which is the sum of retweets and likes for each observation

4. As in the previous session’s tasks, we’ll work with the `cats` dataset from the package MASS

4.5. What are the data types of the columns in the dataset? Again, use `class(...)` to answer this question.

4.7. Create a new column `wt_ratio` which is the ratio of heart and body weight. Make sure to bring both variables to a common unit of measurement (i.e. both in grams or kilograms)