1. Install the package pscl (a package from the Political Science Computational Laboratory of Standford Univ.). Load the data set iraqVote
and have a look at its documentation. Then, using the aggregation and summarization functions you learned in this session, solve the following tasks:
1.a) Replicate the table below, where n_members
is the number of members, n_aye
the number of “Aye” (yes) votes and perc_aye
the share of “Aye” votes per Republican/Non-republican group (Hint: Group on rep
and then use summarise()
).
rep n_members n_aye perc_aye
1 FALSE 51 29 56.9
2 TRUE 49 48 98.0
1.b) In each state, there are two Senators that voted. Their votes might differ (e.g. Senator A voted “Aye”, Sentator B voted “Nay”) and/or their parties might differ (Senator A is Republican, Senator B is Democrat). Create a data frame that for each state indicates whether the votes diverged (new variable votes_differ
) and whether the party membership differs (new variable party_differs
). See the table below for an example output of the first five states. Hint: Consider that there are only two votes per state (where 0 means “Nay” and 1 means “Aye”). So there are four possible combinations of votes (0/0, 1/0, 0/1 or 1/1) out of which only those with a sum of 1 (1/0 or 0/1) are cases of diverging votes. The same principle can be used for the party membership.
state.name votes_differ party_differs
1 Alabama FALSE FALSE
2 Alaska FALSE FALSE
3 Arizona FALSE FALSE
4 Arkansas FALSE TRUE
5 California TRUE FALSE
1.c) From the output of the previous task, select those states, where the votes diverged but the party membership of both senators did not differ. Save the output vector of those state names in an object named div_votes_states
(Hint: You can convert a single column of a data frame to a vector using unlist()
: ... %>% select(state.name) %>% unlist()
). Using div_votes_states
, filter the observations of iraqVote
to get only the observations from the states listed in div_votes_states
. Are there any Republicans in this list?
2. Load the data set politicalInformation
that is also available from the package pscl and view its documentation. Then, solve the following tasks:
2.a) Create a new data set polinf
based on politicalInformation
but with a new variable age_group
. This variable assigns each participant an age category according to the following age ranges:
Use mutate()
together with the case_when()
function and between()
to add this variable. You can have a look at the documentation of these functions and use it as follows:
... %>% mutate(age_group = case_when(
between(age, 18, 29) ~ 'young adult',
between(age, 29, 45) ~ 'adult',
# ...
))
2.b) Filter polinf
to only include non-NA values in age_group
. Then group by age_group
and collageDegree
. Compute the mean of the interviewer rating y
and indicate the number of non-NA values in y
for each group.
3. Load the data set flights
from the package nycflights13.
3.a) Find out the number of carrier companies in the data set (Hint: You can use unique()
or distinct()
for that).
3.b) Find out the three longest flights (in terms of air_time
) for each carrier. Construct a single command combined with %>%
-operators. You can use rank()
to rank observations according to a variable and then filter()
only for the ranks 1 to 3.