Transposing data

There are so many different ways to enter data. Most people concentrate on getting their data into a spreadsheet format before considering what the proper format for their analysis may be. It may become necessary for some analyses to switch the orientation of your dataset from a columnar format to a wider format or the reverse, from a wide format to that of a long column. In this blog post I will show how to use PROC Transpose and the DATA step to go back and forth between these 2 formats.

A wide format is often used when we enter data for a repeated measures type of design. In this case each line of data will represent one experimental unit. As an example, when I collect data for a repeated measures trial, I will have one line of data for each unit (whether it be a person, plot, animal, etc…) and I will add a new column for each time I take a measurement. If I collect data once a month for 6 months I will have a column for each month I collect data.

Wide format:

ID	TREATMENT	HT1	HT2	HT3	HT4	HT5	HT6
01	A	12	18	19	26	34	55
02	A	10	15	19	24	30	45
03	B	11	16	20	25	32	50
04	B	9	11	14	22	38	42

A column type format would consist of a line of data for each measurement taken on an individual. If we have 6 measurements taken on one individual, we would have 6 lines of data for each individual.

Columnar format:

ID	TREATMENT	WEEK	HEIGHT
01	A	1	12
01	A	2	18
01	A	3	19
01	A	4	26
01	A	5	34
01	A	6	55

Which format you will use will depend on your data entry preferences and the analysis you will be performing.

Reading in the datasets

Data wide;
input id trmt$ ht1 ht2 ht3 ht4 ht5 ht6;
datalines;
1 a 12 18 19 26 34 55
2 a 10 15 19 24 30 45
3 b 11 16 20 25 32 50
4 b 9 11 14 22 38 42
;
Run;

Data columnar;
input id trmt$ week ht;
datalines;
1 a 1 12
1 a 2 18
1 a 3 19
1 a 4 26
1 a 5 34
1 a 6 55
2 a 1 10
2 a 2 15
2 a 3 19
2 a 4 24
2 a 5 30
2 a 6 45
3 b 1 11
3 b 2 16
3 b 3 20
3 b 4 25
3 b 5 32
3 b 6 50
4 b 1 9
4 b 2 11
4 b 3 14
4 b 4 22
4 b 5 38
4 b 6 42
;
Run;

Creating a wide file from a columnar file

Proc transpose data=columnar out=wide_out; <- out=new dataset name for transposed data
Run;

Proc print data=wide_out;
Run;

Not quite what we want, we need to tell SAS that we would like the file transposed by ID and trmt. So let’s add a BY statement:

Proc transpose data=columnar out=wide_out;
by id trmt;
Run;

Proc print data=wide_out;
Run;

Closer, but now we have a row for something called week and ht. We do not need the rows that are labelled “week”. To remove these we will tell SAS that we are only interested in the “variable” called ht by adding a VAR statement to our PROC:

Proc transpose data=columnar out=wide_out;
by id trmt;
var ht;
Run;

Proc print data=wide_out;
Run;

Much closer… now we would like to change the column headings from COL1 to Week1 by adding a PREFIX= option in the PROC statement:

Proc transpose data=columnar out=wide_out prefix=week;
by id trmt;
var ht;
Run;

Proc print data=wide_out;
Run;

For all intents and purposes we now have a dataset we can use for a repeated measures ANOVA using PROC GLM. The heading _NAME_ may be distracting, so let’s change its name to something more meaningful by adding a NAME= option in the PROC statement:

Proc transpose data=columnar out=wide_out prefix=week name=measure;
by id trmt;
var ht;
Run;

Proc print data=wide_out;
Run;

Creating a columnar file from a wide file

Moving in this direction, you have a couple of options, using the DATA Step or using PROC Transpose.

Using the DATA Step

Data columnar (keep=id trmt week ht); <- in the new dataset called columnar only keep the variables called id, trmt, week and ht

that means for this dataset the variables, ht1, ht2, ht3, ht4, ht5, ht6 will be deleted from the new dataset called columnar
set wide;
week = 1; ht = ht1; output columnar; <-create a new variable called week with a value of “1” for the first week’s measurement

. create a new variable called ht with the value of ht1

. output = save the data into the new dataset called columnar
week = 2; ht = ht2; output columnar;
week = 3; ht = ht3; output columnar;
week = 4; ht = ht4; output columnar;
week = 5; ht = ht5; output columnar;
week = 6; ht = ht6; output columnar;
Run;

Nice and straightforward and takes advantage of the power of the DATA Step.