R Tutorial at the WZB

Tasks

1. Install the package pscl (a package from the Political Science Computational Laboratory of Standford Univ.). Load the data set iraqVote and have a look at its documentation. Then, using the aggregation and summarization functions you learned in this session, solve the following tasks:

1.a) Replicate the table below, where n_members is the number of members, n_aye the number of “Aye” (yes) votes and perc_aye the share of “Aye” votes per Republican/Non-republican group (Hint: Group on rep and then use summarise()).

  rep   n_members n_aye perc_aye
1 FALSE        51    29     56.9
2 TRUE         49    48     98.0

1.b) In each state, there are two Senators that voted. Their votes might differ (e.g. Senator A voted “Aye”, Sentator B voted “Nay”) and/or their parties might differ (Senator A is Republican, Senator B is Democrat). Create a data frame that for each state indicates whether the votes diverged (new variable votes_differ) and whether the party membership differs (new variable party_differs). See the table below for an example output of the first five states. Hint: Consider that there are only two votes per state (where 0 means “Nay” and 1 means “Aye”). So there are four possible combinations of votes (0/0, 1/0, 0/1 or 1/1) out of which only those with a sum of 1 (1/0 or 0/1) are cases of diverging votes. The same principle can be used for the party membership.

   state.name  votes_differ party_differs
 1 Alabama     FALSE        FALSE        
 2 Alaska      FALSE        FALSE        
 3 Arizona     FALSE        FALSE        
 4 Arkansas    FALSE        TRUE         
 5 California  TRUE         FALSE

1.c) From the output of the previous task, select those states, where the votes diverged but the party membership of both senators did not differ. Save the output vector of those state names in an object named div_votes_states (Hint: You can convert a single column of a data frame to a vector using unlist(): ... %>% select(state.name) %>% unlist()). Using div_votes_states, filter the observations of iraqVote to get only the observations from the states listed in div_votes_states. Are there any Republicans in this list?

2. Load the data set politicalInformation that is also available from the package pscl and view its documentation. Then, solve the following tasks:

2.a) Create a new data set polinf based on politicalInformation but with a new variable age_group. This variable assigns each participant an age category according to the following age ranges:

age 18 to 29: “young adult”
age 29 to 45: “adult”
age 46 to 60: “middle age”
age 60 to 99: “senior”

Use mutate() together with the case_when() function and between() to add this variable. You can have a look at the documentation of these functions and use it as follows:

... %>% mutate(age_group = case_when(
    between(age, 18, 29) ~ 'young adult',
    between(age, 29, 45) ~ 'adult',
    # ...
  ))

2.b) Filter polinf to only include non-NA values in age_group. Then group by age_group and collageDegree. Compute the mean of the interviewer rating y and indicate the number of non-NA values in y for each group.

3. Load the data set flights from the package nycflights13.

3.a) Find out the number of carrier companies in the data set (Hint: You can use unique() or distinct() for that).

3.b) Find out the three longest flights (in terms of air_time) for each carrier. Construct a single command combined with %>%-operators. You can use rank() to rank observations according to a variable and then filter() only for the ranks 1 to 3.

R Tutorial at the WZB

Tasks for session 6 - Recap / Transforming data with R II

Markus Konrad

November 29, 2018

Tasks