Tasks

  1. Install and load the package reshape2 and have a look at the data set smiths. It is a very small data set (two observations) used for demonstration purposes.
    1. Add a logical (TRUE/FALSE) variable smoker to smiths. Use any values you like, e.g. c(FALSE, TRUE)
    2. Reshape this data set to form a “long table” format. Do this by using gather() on the columns age, weight, height and smoker. Set the “key” column’s name to “var” and the “value” column’s name to “y”. Store the result in an object named smiths_long. What happened to the logical values of the smoker variable?
    3. Use spread() on smiths_long to convert back to its original format, having age, weight, height and smoker as separate columns. What is the type of the smoker variable now and why is that so?
  2. Install the package reshape2 and inform yourself about the french_fries data that comes with the package.
    1. There are 5 flavors being rated in each row. Convert the data into “long format” by using gather() on the five flavor columns. The “key” column should be named “flavor” and the “value” column should be named “rating”. Store the result in an object tidy_fries.
    2. There are 696 observations in the original data set. Compare that to the number of rows in the “long” data set that you just created. Does the number of rows in the “long” data set make sense?
    3. Make a boxplot that shows a box for each interaction between treatment and flavor (x-axis) regarding their rating (y-axis). Note: Interaction means each possible combination of treatment and flavor. It can be constructed with interaction(treatment, flavor).
    4. Improve the above plot by creating “small multiples”, i.e. facets for the variable flavor. This means each facet (i.e. a small embedded plot) then shows a boxplot for a specific flavor (with treatments on the x-axis and rating on the y-axis). Note: Add facet_grid(~ flavor) to your plot to create small boxplots per flavor in a row.
  3. Load the data UN from the package carData and inform yourself about it.
    1. You want to study the relationship between Gross Domestic Product (GDP) and infant mortality using this data. What kind of plot can you use to do that? Which variables go on which axes?
    2. Construct a scatter plot that plots GDP (ppgdp variable) against infant mortality rate (infantMortality variable).
    3. How can you improve the plot to avoid overplotting? How can you aid the eye in showing a trend?
    4. Add another dimension to the plot by making the points’ color dependent on the variable group. Keep the trend lines. What are the problems with the trend lines, especially for the groups “africa” and “other”?
  4. Get acquainted with the data set Arrests from the package carData
    1. Summarize the data by creating a data set with the number of arrests per year and sex (Hint: you can use group_by() and count() or n())
    2. Visualize the data using an appropriate type of plot. Is there something in the trend that makes you wonder? If so, what could be the reason(s) for that?