We had an interesting discussion today about the pros and cons of “wide” and “long table” formats. There are always different paths you can take with R when you want to reach some goal, which is why R is praised for its flexibility. However, this flexibility can also give you a feeling of “getting lost” and I’m aware of that. Over time, you will see which paths and workflows work best for you. I’m mostly presenting solutions that I think work well, because I can’t show all possible solutions and also I simply don’t know all possible ways.

I’m also not saying that all solutions are equally good. For the part on “Reshaping data” in today’s session, I would like to show a few more ways and have a look at their pros and cons.

Again, I’ll use the “OBrienKaiser” data included in the package carData:

library(dplyr)    # we also need dplyr later
library(carData)

OBrienKaiser
##    treatment gender pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3
## 1    control      M     1     2     4     2     1      3      2      5
## 2    control      M     4     4     5     3     4      2      2      3
## 3    control      M     5     6     5     7     7      4      5      7
## 4    control      F     5     4     7     5     4      2      2      3
## 5    control      F     3     4     6     4     3      6      7      8
## 6          A      M     7     8     7     9     9      9      9     10
## 7          A      M     5     5     6     4     5      7      7      8
## 8          A      F     2     3     5     3     2      2      4      8
## 9          A      F     3     3     4     6     4      4      5      6
## 10         B      M     4     4     5     3     4      6      7      6
## 11         B      M     3     3     4     2     3      5      4      7
## 12         B      M     6     7     8     6     3      9     10     11
## 13         B      F     5     5     6     8     6      4      6      6
## 14         B      F     2     2     3     1     2      5      6      7
## 15         B      F     2     2     3     4     4      6      6      7
## 16         B      F     4     5     7     5     4      7      7      8
##    post.4 post.5 fup.1 fup.2 fup.3 fup.4 fup.5
## 1       3      2     2     3     2     4     4
## 2       5      3     4     5     6     4     1
## 3       5      4     7     6     9     7     6
## 4       5      3     4     4     5     3     4
## 5       6      3     4     3     6     4     3
## 6       8      9     9    10    11     9     6
## 7      10      8     8     9    11     9     8
## 8       6      5     6     6     7     5     6
## 9       4      1     5     4     7     5     4
## 10      8      8     8     8     9     7     8
## 11      5      4     5     6     8     6     5
## 12      9      6     8     7    10     8     7
## 13      8      6     7     7     8    10     8
## 14      5      2     6     7     8     6     3
## 15      9      7     7     7     8     6     7
## 16      6      7     7     8    10     8     7

The documentation says:

These contrived repeated-measures data are taken from O’Brien and Kaiser (1985). The data are from an imaginary study in which 16 female and male subjects, who are divided into three treatments, are measured at a pretest, postest, and a follow-up session; during each session, they are measured at five occasions at intervals of one hour.

So each row represents a subject, but we have several measurements per row.

Let’s make a copy of the original data at first (this way, we can also use a shorter name):

obk <- OBrienKaiser

Our goal is to compute the mean per treatment and measurement type (pre, post, follow-up – “fup”).

1. A dangerous approach

Taken the data as is, you could be tempted to calculate a mean per subject for each measurement type first. I’m saying “tempted” because this approach can lead to some serious problems as I will explain below. Anyway, I’d like to proceed with the calculation:

# means for all "pretest" measurements per subject (16 values -- one for each subject)
(obk$pre.1 + obk$pre.2 + obk$pre.3 + obk$pre.4 + obk$pre.5) / 5
##  [1] 2 4 6 5 4 8 5 3 4 4 3 6 6 2 3 5

This is a bit cumbersome. We can also select the “pre…” columns by index (columns 3 to 7) and calculate the row-wise sum and divide by 5:

obk$pre_avg <- rowSums(obk[3:7]) / 5
obk$pre_avg
##  [1] 2 4 6 5 4 8 5 3 4 4 3 6 6 2 3 5

We repeat this for “post” and “fup”:

obk$post_avg <- rowSums(obk[8:12]) / 5
obk$fup_avg <- rowSums(obk[13:17]) / 5

We now have:

select(obk, treatment, gender, pre_avg, post_avg, fup_avg)
##    treatment gender pre_avg post_avg fup_avg
## 1    control      M       2        3       3
## 2    control      M       4        3       4
## 3    control      M       6        5       7
## 4    control      F       5        3       4
## 5    control      F       4        6       4
## 6          A      M       8        9       9
## 7          A      M       5        8       9
## 8          A      F       3        5       6
## 9          A      F       4        4       5
## 10         B      M       4        7       8
## 11         B      M       3        5       6
## 12         B      M       6        9       8
## 13         B      F       6        6       8
## 14         B      F       2        5       6
## 15         B      F       3        7       7
## 16         B      F       5        7       8

You would now take the mean of those per-subject-means for each treatment. This is, we take the mean of means and you might already hear the alarm bells ringing… Anyway, let’s calculate the result:

group_by(obk, treatment) %>% summarize(pretest_mean = mean(pre_avg),
                                       posttest_mean = mean(post_avg),
                                       fuptest_mean = mean(fup_avg))
## # A tibble: 3 x 4
##   treatment pretest_mean posttest_mean fuptest_mean
##   <fct>            <dbl>         <dbl>        <dbl>
## 1 control           4.2           4            4.4 
## 2 A                 5             6.5          7.25
## 3 B                 4.14          6.57         7.29

This works and the output is correct. Furthermore, the solution is not overly complicated. However, it has some drawbacks, especially with the calculation of the per-subject means (pre_avg, post_avg, fup_avg): First of all, you really have to watch out that you select the right columns when you do things like rowSums(obk[3:7]) / 5. However, the main concern here is that we calculate the “mean of means” which can be problematic. In fact, it is always problematic whenever you have different sample sizes across groups. See also this explanation. In our case it works, because we always have five measurements for each subject. In the very most cases however, you will have some missings. If we just had a single NA value somewhere in our data, we would get biased results.

2. A not so dangerous approach

What you could do otherwise is calculating the means not per subject and measurement type, but per treatments and measurement type. You can do so by subsetting (filtering) per treatment first and then you get three datasets to which you apply the same calculations:

obk <- OBrienKaiser   # original data set again

ctrl <- obk %>% filter(treatment == 'control') %>% select(pre.1:fup.5)
ctrl
##   pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3 post.4 post.5 fup.1
## 1     1     2     4     2     1      3      2      5      3      2     2
## 2     4     4     5     3     4      2      2      3      5      3     4
## 3     5     6     5     7     7      4      5      7      5      4     7
## 4     5     4     7     5     4      2      2      3      5      3     4
## 5     3     4     6     4     3      6      7      8      6      3     4
##   fup.2 fup.3 fup.4 fup.5
## 1     3     2     4     4
## 2     5     6     4     1
## 3     6     9     7     6
## 4     4     5     3     4
## 5     3     6     4     3

This is only the data for the control group. We can calculate the mean for the whole control group’s “pre” measurements by selecting the columns for “pre” values (1 to 5) and converting to a matrix type (mean can only handle vectors and matrices):

(ctrl_pre_mean <- mean(as.matrix(ctrl[1:5])))
## [1] 4.2

Repeat with different column indices for “post”:

(ctrl_post_mean <- mean(as.matrix(ctrl[6:10])))
## [1] 4

… and for “fup”:

(ctrl_fup_mean <- mean(as.matrix(ctrl[11:15])))
## [1] 4.4

Now you would repeat this for the treatments “A” and “B”.

By this, you can get around the “mean of means” problem but still I find it’s not very flexible (what if you also wanted to split groups by gender for example?) and also it leads to a lot of code repetition (always a bad sign!).

3. Creating a long table format

I find it easier to work with a long table format in such circumstances. Also, only this allows us to use ggplot2 later for plotting, as it expects the data to be in this format.

Let’s re-assign the original data set:

obk <- OBrienKaiser

I will add an “ID” for each subject, which will make it easier to understand where each subject’s data resides in the long table format later:

obk <- obk %>% mutate(id = 1:nrow(OBrienKaiser))
obk
##    treatment gender pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3
## 1    control      M     1     2     4     2     1      3      2      5
## 2    control      M     4     4     5     3     4      2      2      3
## 3    control      M     5     6     5     7     7      4      5      7
## 4    control      F     5     4     7     5     4      2      2      3
## 5    control      F     3     4     6     4     3      6      7      8
## 6          A      M     7     8     7     9     9      9      9     10
## 7          A      M     5     5     6     4     5      7      7      8
## 8          A      F     2     3     5     3     2      2      4      8
## 9          A      F     3     3     4     6     4      4      5      6
## 10         B      M     4     4     5     3     4      6      7      6
## 11         B      M     3     3     4     2     3      5      4      7
## 12         B      M     6     7     8     6     3      9     10     11
## 13         B      F     5     5     6     8     6      4      6      6
## 14         B      F     2     2     3     1     2      5      6      7
## 15         B      F     2     2     3     4     4      6      6      7
## 16         B      F     4     5     7     5     4      7      7      8
##    post.4 post.5 fup.1 fup.2 fup.3 fup.4 fup.5 id
## 1       3      2     2     3     2     4     4  1
## 2       5      3     4     5     6     4     1  2
## 3       5      4     7     6     9     7     6  3
## 4       5      3     4     4     5     3     4  4
## 5       6      3     4     3     6     4     3  5
## 6       8      9     9    10    11     9     6  6
## 7      10      8     8     9    11     9     8  7
## 8       6      5     6     6     7     5     6  8
## 9       4      1     5     4     7     5     4  9
## 10      8      8     8     8     9     7     8 10
## 11      5      4     5     6     8     6     5 11
## 12      9      6     8     7    10     8     7 12
## 13      8      6     7     7     8    10     8 13
## 14      5      2     6     7     8     6     3 14
## 15      9      7     7     7     8     6     7 15
## 16      6      7     7     8    10     8     7 16

I will now apply all the data transformations presented in today’s slides at once using the %>% pipe (I explained a lot about each step today so I won’t repeat that here):

library(tidyr)

obk_tidy <- gather(obk, pre.1:fup.5, key = 'meas_type_occasion', value = 'value') %>%
    separate(meas_type_occasion, into = c('meas_type', 'meas_occasion')) %>%
    mutate(meas_type = factor(meas_type, levels = c('pre', 'post', 'fup')),
           meas_occasion = as.integer(meas_occasion)) %>%
    arrange(id, meas_type)  # reorder to better see each subject's measurements

obk_tidy
##     treatment gender id meas_type meas_occasion value
## 1     control      M  1       pre             1     1
## 2     control      M  1       pre             2     2
## 3     control      M  1       pre             3     4
## 4     control      M  1       pre             4     2
## 5     control      M  1       pre             5     1
## 6     control      M  1      post             1     3
## 7     control      M  1      post             2     2
## 8     control      M  1      post             3     5
## 9     control      M  1      post             4     3
## 10    control      M  1      post             5     2
## 11    control      M  1       fup             1     2
## 12    control      M  1       fup             2     3
## 13    control      M  1       fup             3     2
## 14    control      M  1       fup             4     4
## 15    control      M  1       fup             5     4
## 16    control      M  2       pre             1     4
## 17    control      M  2       pre             2     4
## 18    control      M  2       pre             3     5
## 19    control      M  2       pre             4     3
## 20    control      M  2       pre             5     4
## 21    control      M  2      post             1     2
## 22    control      M  2      post             2     2
## 23    control      M  2      post             3     3
## 24    control      M  2      post             4     5
## 25    control      M  2      post             5     3
## 26    control      M  2       fup             1     4
## 27    control      M  2       fup             2     5
## 28    control      M  2       fup             3     6
## 29    control      M  2       fup             4     4
## 30    control      M  2       fup             5     1
## 31    control      M  3       pre             1     5
## 32    control      M  3       pre             2     6
## 33    control      M  3       pre             3     5
## 34    control      M  3       pre             4     7
## 35    control      M  3       pre             5     7
## 36    control      M  3      post             1     4
## 37    control      M  3      post             2     5
## 38    control      M  3      post             3     7
## 39    control      M  3      post             4     5
## 40    control      M  3      post             5     4
## 41    control      M  3       fup             1     7
## 42    control      M  3       fup             2     6
## 43    control      M  3       fup             3     9
## 44    control      M  3       fup             4     7
## 45    control      M  3       fup             5     6
## 46    control      F  4       pre             1     5
## 47    control      F  4       pre             2     4
## 48    control      F  4       pre             3     7
## 49    control      F  4       pre             4     5
## 50    control      F  4       pre             5     4
## 51    control      F  4      post             1     2
## 52    control      F  4      post             2     2
## 53    control      F  4      post             3     3
## 54    control      F  4      post             4     5
## 55    control      F  4      post             5     3
## 56    control      F  4       fup             1     4
## 57    control      F  4       fup             2     4
## 58    control      F  4       fup             3     5
## 59    control      F  4       fup             4     3
## 60    control      F  4       fup             5     4
## 61    control      F  5       pre             1     3
## 62    control      F  5       pre             2     4
## 63    control      F  5       pre             3     6
## 64    control      F  5       pre             4     4
## 65    control      F  5       pre             5     3
## 66    control      F  5      post             1     6
## 67    control      F  5      post             2     7
## 68    control      F  5      post             3     8
## 69    control      F  5      post             4     6
## 70    control      F  5      post             5     3
## 71    control      F  5       fup             1     4
## 72    control      F  5       fup             2     3
## 73    control      F  5       fup             3     6
## 74    control      F  5       fup             4     4
## 75    control      F  5       fup             5     3
## 76          A      M  6       pre             1     7
## 77          A      M  6       pre             2     8
## 78          A      M  6       pre             3     7
## 79          A      M  6       pre             4     9
## 80          A      M  6       pre             5     9
## 81          A      M  6      post             1     9
## 82          A      M  6      post             2     9
## 83          A      M  6      post             3    10
## 84          A      M  6      post             4     8
## 85          A      M  6      post             5     9
## 86          A      M  6       fup             1     9
## 87          A      M  6       fup             2    10
## 88          A      M  6       fup             3    11
## 89          A      M  6       fup             4     9
## 90          A      M  6       fup             5     6
## 91          A      M  7       pre             1     5
## 92          A      M  7       pre             2     5
## 93          A      M  7       pre             3     6
## 94          A      M  7       pre             4     4
## 95          A      M  7       pre             5     5
## 96          A      M  7      post             1     7
## 97          A      M  7      post             2     7
## 98          A      M  7      post             3     8
## 99          A      M  7      post             4    10
## 100         A      M  7      post             5     8
## 101         A      M  7       fup             1     8
## 102         A      M  7       fup             2     9
## 103         A      M  7       fup             3    11
## 104         A      M  7       fup             4     9
## 105         A      M  7       fup             5     8
## 106         A      F  8       pre             1     2
## 107         A      F  8       pre             2     3
## 108         A      F  8       pre             3     5
## 109         A      F  8       pre             4     3
## 110         A      F  8       pre             5     2
## 111         A      F  8      post             1     2
## 112         A      F  8      post             2     4
## 113         A      F  8      post             3     8
## 114         A      F  8      post             4     6
## 115         A      F  8      post             5     5
## 116         A      F  8       fup             1     6
## 117         A      F  8       fup             2     6
## 118         A      F  8       fup             3     7
## 119         A      F  8       fup             4     5
## 120         A      F  8       fup             5     6
## 121         A      F  9       pre             1     3
## 122         A      F  9       pre             2     3
## 123         A      F  9       pre             3     4
## 124         A      F  9       pre             4     6
## 125         A      F  9       pre             5     4
## 126         A      F  9      post             1     4
## 127         A      F  9      post             2     5
## 128         A      F  9      post             3     6
## 129         A      F  9      post             4     4
## 130         A      F  9      post             5     1
## 131         A      F  9       fup             1     5
## 132         A      F  9       fup             2     4
## 133         A      F  9       fup             3     7
## 134         A      F  9       fup             4     5
## 135         A      F  9       fup             5     4
## 136         B      M 10       pre             1     4
## 137         B      M 10       pre             2     4
## 138         B      M 10       pre             3     5
## 139         B      M 10       pre             4     3
## 140         B      M 10       pre             5     4
## 141         B      M 10      post             1     6
## 142         B      M 10      post             2     7
## 143         B      M 10      post             3     6
## 144         B      M 10      post             4     8
## 145         B      M 10      post             5     8
## 146         B      M 10       fup             1     8
## 147         B      M 10       fup             2     8
## 148         B      M 10       fup             3     9
## 149         B      M 10       fup             4     7
## 150         B      M 10       fup             5     8
## 151         B      M 11       pre             1     3
## 152         B      M 11       pre             2     3
## 153         B      M 11       pre             3     4
## 154         B      M 11       pre             4     2
## 155         B      M 11       pre             5     3
## 156         B      M 11      post             1     5
## 157         B      M 11      post             2     4
## 158         B      M 11      post             3     7
## 159         B      M 11      post             4     5
## 160         B      M 11      post             5     4
## 161         B      M 11       fup             1     5
## 162         B      M 11       fup             2     6
## 163         B      M 11       fup             3     8
## 164         B      M 11       fup             4     6
## 165         B      M 11       fup             5     5
## 166         B      M 12       pre             1     6
## 167         B      M 12       pre             2     7
## 168         B      M 12       pre             3     8
## 169         B      M 12       pre             4     6
## 170         B      M 12       pre             5     3
## 171         B      M 12      post             1     9
## 172         B      M 12      post             2    10
## 173         B      M 12      post             3    11
## 174         B      M 12      post             4     9
## 175         B      M 12      post             5     6
## 176         B      M 12       fup             1     8
## 177         B      M 12       fup             2     7
## 178         B      M 12       fup             3    10
## 179         B      M 12       fup             4     8
## 180         B      M 12       fup             5     7
## 181         B      F 13       pre             1     5
## 182         B      F 13       pre             2     5
## 183         B      F 13       pre             3     6
## 184         B      F 13       pre             4     8
## 185         B      F 13       pre             5     6
## 186         B      F 13      post             1     4
## 187         B      F 13      post             2     6
## 188         B      F 13      post             3     6
## 189         B      F 13      post             4     8
## 190         B      F 13      post             5     6
## 191         B      F 13       fup             1     7
## 192         B      F 13       fup             2     7
## 193         B      F 13       fup             3     8
## 194         B      F 13       fup             4    10
## 195         B      F 13       fup             5     8
## 196         B      F 14       pre             1     2
## 197         B      F 14       pre             2     2
## 198         B      F 14       pre             3     3
## 199         B      F 14       pre             4     1
## 200         B      F 14       pre             5     2
## 201         B      F 14      post             1     5
## 202         B      F 14      post             2     6
## 203         B      F 14      post             3     7
## 204         B      F 14      post             4     5
## 205         B      F 14      post             5     2
## 206         B      F 14       fup             1     6
## 207         B      F 14       fup             2     7
## 208         B      F 14       fup             3     8
## 209         B      F 14       fup             4     6
## 210         B      F 14       fup             5     3
## 211         B      F 15       pre             1     2
## 212         B      F 15       pre             2     2
## 213         B      F 15       pre             3     3
## 214         B      F 15       pre             4     4
## 215         B      F 15       pre             5     4
## 216         B      F 15      post             1     6
## 217         B      F 15      post             2     6
## 218         B      F 15      post             3     7
## 219         B      F 15      post             4     9
## 220         B      F 15      post             5     7
## 221         B      F 15       fup             1     7
## 222         B      F 15       fup             2     7
## 223         B      F 15       fup             3     8
## 224         B      F 15       fup             4     6
## 225         B      F 15       fup             5     7
## 226         B      F 16       pre             1     4
## 227         B      F 16       pre             2     5
## 228         B      F 16       pre             3     7
## 229         B      F 16       pre             4     5
## 230         B      F 16       pre             5     4
## 231         B      F 16      post             1     7
## 232         B      F 16      post             2     7
## 233         B      F 16      post             3     8
## 234         B      F 16      post             4     6
## 235         B      F 16      post             5     7
## 236         B      F 16       fup             1     7
## 237         B      F 16       fup             2     8
## 238         B      F 16       fup             3    10
## 239         B      F 16       fup             4     8
## 240         B      F 16       fup             5     7

By looking at the id column, you can see that each subject’s values are spread across several rows. All this repeated data may seem superfluent but (1) it doesn’t hurt (as long as the data set fits in your computer’s RAM) and (2) this layout gives us full flexibility for grouping and we don’t need to care about “mean of means” problems, as the mean is always calculated only once per group and takes the group size into account.

To calculate the same result as before (mean per treatment-measurement type group):

obk_tidy %>% group_by(treatment, meas_type) %>% summarize(mean_per_type = mean(value, na.rm = TRUE))
## # A tibble: 9 x 3
## # Groups:   treatment [?]
##   treatment meas_type mean_per_type
##   <fct>     <fct>             <dbl>
## 1 control   pre                4.2 
## 2 control   post               4   
## 3 control   fup                4.4 
## 4 A         pre                5   
## 5 A         post               6.5 
## 6 A         fup                7.25
## 7 B         pre                4.14
## 8 B         post               6.57
## 9 B         fup                7.29

Further split by gender and also show group size:

obk_tidy %>% group_by(treatment, meas_type, gender) %>%
    summarize(mean = mean(value, na.rm = TRUE), group_size = n())
## # A tibble: 18 x 5
## # Groups:   treatment, meas_type [?]
##    treatment meas_type gender  mean group_size
##    <fct>     <fct>     <fct>  <dbl>      <int>
##  1 control   pre       F       4.5          10
##  2 control   pre       M       4            15
##  3 control   post      F       4.5          10
##  4 control   post      M       3.67         15
##  5 control   fup       F       4            10
##  6 control   fup       M       4.67         15
##  7 A         pre       F       3.5          10
##  8 A         pre       M       6.5          10
##  9 A         post      F       4.5          10
## 10 A         post      M       8.5          10
## 11 A         fup       F       5.5          10
## 12 A         fup       M       9            10
## 13 B         pre       F       4            20
## 14 B         pre       M       4.33         15
## 15 B         post      F       6.25         20
## 16 B         post      M       7            15
## 17 B         fup       F       7.25         20
## 18 B         fup       M       7.33         15

Using spread we can also convert the data to a different format again. In the following format, we have columns for each measurment type (pre, post, fup) which contain values for each measurment occasion (five per subject). I believe that this is the format that Maja meant today when she talked about how she would transform the data in Stata.

obk_tidy2 <- obk_tidy %>% spread(meas_type, value) %>% arrange(id, meas_occasion)
obk_tidy2
##    treatment gender id meas_occasion pre post fup
## 1    control      M  1             1   1    3   2
## 2    control      M  1             2   2    2   3
## 3    control      M  1             3   4    5   2
## 4    control      M  1             4   2    3   4
## 5    control      M  1             5   1    2   4
## 6    control      M  2             1   4    2   4
## 7    control      M  2             2   4    2   5
## 8    control      M  2             3   5    3   6
## 9    control      M  2             4   3    5   4
## 10   control      M  2             5   4    3   1
## 11   control      M  3             1   5    4   7
## 12   control      M  3             2   6    5   6
## 13   control      M  3             3   5    7   9
## 14   control      M  3             4   7    5   7
## 15   control      M  3             5   7    4   6
## 16   control      F  4             1   5    2   4
## 17   control      F  4             2   4    2   4
## 18   control      F  4             3   7    3   5
## 19   control      F  4             4   5    5   3
## 20   control      F  4             5   4    3   4
## 21   control      F  5             1   3    6   4
## 22   control      F  5             2   4    7   3
## 23   control      F  5             3   6    8   6
## 24   control      F  5             4   4    6   4
## 25   control      F  5             5   3    3   3
## 26         A      M  6             1   7    9   9
## 27         A      M  6             2   8    9  10
## 28         A      M  6             3   7   10  11
## 29         A      M  6             4   9    8   9
## 30         A      M  6             5   9    9   6
## 31         A      M  7             1   5    7   8
## 32         A      M  7             2   5    7   9
## 33         A      M  7             3   6    8  11
## 34         A      M  7             4   4   10   9
## 35         A      M  7             5   5    8   8
## 36         A      F  8             1   2    2   6
## 37         A      F  8             2   3    4   6
## 38         A      F  8             3   5    8   7
## 39         A      F  8             4   3    6   5
## 40         A      F  8             5   2    5   6
## 41         A      F  9             1   3    4   5
## 42         A      F  9             2   3    5   4
## 43         A      F  9             3   4    6   7
## 44         A      F  9             4   6    4   5
## 45         A      F  9             5   4    1   4
## 46         B      M 10             1   4    6   8
## 47         B      M 10             2   4    7   8
## 48         B      M 10             3   5    6   9
## 49         B      M 10             4   3    8   7
## 50         B      M 10             5   4    8   8
## 51         B      M 11             1   3    5   5
## 52         B      M 11             2   3    4   6
## 53         B      M 11             3   4    7   8
## 54         B      M 11             4   2    5   6
## 55         B      M 11             5   3    4   5
## 56         B      M 12             1   6    9   8
## 57         B      M 12             2   7   10   7
## 58         B      M 12             3   8   11  10
## 59         B      M 12             4   6    9   8
## 60         B      M 12             5   3    6   7
## 61         B      F 13             1   5    4   7
## 62         B      F 13             2   5    6   7
## 63         B      F 13             3   6    6   8
## 64         B      F 13             4   8    8  10
## 65         B      F 13             5   6    6   8
## 66         B      F 14             1   2    5   6
## 67         B      F 14             2   2    6   7
## 68         B      F 14             3   3    7   8
## 69         B      F 14             4   1    5   6
## 70         B      F 14             5   2    2   3
## 71         B      F 15             1   2    6   7
## 72         B      F 15             2   2    6   7
## 73         B      F 15             3   3    7   8
## 74         B      F 15             4   4    9   6
## 75         B      F 15             5   4    7   7
## 76         B      F 16             1   4    7   7
## 77         B      F 16             2   5    7   8
## 78         B      F 16             3   7    8  10
## 79         B      F 16             4   5    6   8
## 80         B      F 16             5   4    7   7

This format is also fine to work with! I think it’s mainly a matter of taste or habits. Again, let’s calculate the means per treatment and for each measurement type:

obk_tidy2 %>% group_by(treatment) %>%
    summarize(mean_pre = mean(pre, na.rm = TRUE),
              mean_post = mean(post, na.rm = TRUE),
              mean_fup = mean(fup, na.rm = TRUE))
## # A tibble: 3 x 4
##   treatment mean_pre mean_post mean_fup
##   <fct>        <dbl>     <dbl>    <dbl>
## 1 control       4.2       4        4.4 
## 2 A             5         6.5      7.25
## 3 B             4.14      6.57     7.29

3. Plotting

When it comes to plotting with ggplot2, you are restricted in how your data set should be formatted. A general rule is that every variable that controls some sort of visual property in your plot must be in its own column. For example, if we want to use a different color in our plot per measurement type, then the measurement type must be recorded in a column. Otherwise, we couldn’t specify it as aes(color = meas_type) in our aesthetics definition. So a data set in which “pre”, “post” and “fup” are in seperate columns wouldn’t work here.

All columns in our reshaped data set obk_tidy can now be used to set up the plot. Here, we plot the measurement type on the x-axis, the outcome (value) on the y-axis and make small multiples (facets) per treatment:

library(ggplot2)

ggplot(obk_tidy, aes(x = meas_type, y = value)) + geom_boxplot() + facet_wrap(~ treatment)

ggplot(obk_tidy, aes(value, fill = meas_type)) +
    geom_density(alpha = 0.33) +
    facet_wrap(~ treatment, ncol = 1) +
    theme_minimal()