Tasks
- Install and load the package reshape2 and have a look at the data set
smiths
. It is a very small data set (two observations) used for demonstration purposes.
- Add a logical (TRUE/FALSE) variable
smoker
to smiths
. Use any values you like, e.g. c(FALSE, TRUE)
- Reshape this data set to form a “long table” format. Do this by using
gather()
on the columns age
, weight
, height
and smoker
. Set the “key” column’s name to “var” and the “value” column’s name to “y”. Store the result in an object named smiths_long
. What happened to the logical values of the smoker
variable?
- Use
spread()
on smiths_long
to convert back to its original format, having age
, weight
, height
and smoker
as separate columns. What is the type of the smoker
variable now and why is that so?
- Install the package reshape2 and inform yourself about the
french_fries
data that comes with the package.
- There are 5 flavors being rated in each row. Convert the data into “long format” by using
gather()
on the five flavor columns. The “key” column should be named “flavor” and the “value” column should be named “rating”. Store the result in an object tidy_fries
.
- There are 696 observations in the original data set. Compare that to the number of rows in the “long” data set that you just created. Does the number of rows in the “long” data set make sense?
- Make a boxplot that shows a box for each interaction between
treatment
and flavor
(x-axis) regarding their rating
(y-axis). Note: Interaction means each possible combination of treatment and flavor. It can be constructed with interaction(treatment, flavor)
.
- Improve the above plot by creating “small multiples”, i.e. facets for the variable
flavor
. This means each facet (i.e. a small embedded plot) then shows a boxplot for a specific flavor (with treatments on the x-axis and rating on the y-axis). Note: Add facet_grid(~ flavor)
to your plot to create small boxplots per flavor in a row.
- Load the data
UN
from the package carData
and inform yourself about it.
- You want to study the relationship between Gross Domestic Product (GDP) and infant mortality using this data. What kind of plot can you use to do that? Which variables go on which axes?
- Construct a scatter plot that plots GDP (
ppgdp
variable) against infant mortality rate (infantMortality
variable).
- How can you improve the plot to avoid overplotting? How can you aid the eye in showing a trend?
- Add another dimension to the plot by making the points’ color dependent on the variable
group
. Keep the trend lines. What are the problems with the trend lines, especially for the groups “africa” and “other”?
- Get acquainted with the data set
Arrests
from the package carData
- Summarize the data by creating a data set with the number of arrests per
year
and sex
(Hint: you can use group_by()
and count()
or n()
)
- Visualize the data using an appropriate type of plot. Is there something in the trend that makes you wonder? If so, what could be the reason(s) for that?