Manipulating Matrices¶

The great thing about matrices is that since they are just generalizations of vectors from one dimension to two, subsetting matrices works almost the same way it works with vectors. Basically, instead of subsetting by passing an index or a logical array into a set of square brackets (e.g. [1]), we just put a comma in those square brackets and specify a location with two indices / logical arrays (e.g. [1,1]).

Subsetting by Index¶

Suppose we have the following matrix:

[1]:

our_matrix <- matrix(1:12, nrow = 3, ncol = 4)
our_matrix

A matrix: 3 × 4 of type int
1	4	7	10
2	5	8	11
3	6	9	12

To subset, we just pass a location along the x-axis (rows) and a location along the y-axis (columns). For example, if we wanted the entry from the second row and third column, we’d type:

[2]:

our_matrix[2, 3]

8

The one new thing is that if you want ALL entries along a specific dimension, you still put in a comma, but you leave the entry blank for the dimension on which you want all observations. So if I wanted to second row, I’d just type:

[3]:

our_matrix[2, ]

2
5
8
11

Or if I wanted the third column, I’d type:

[4]:

our_matrix[, 3]

7
8
9

Note that if you pull out a subset of your matrix that is one dimensional, it just becomes a vector!

[5]:

class(our_matrix)

'matrix'
'array'

[6]:

class(our_matrix[1, ])

'integer'

Finally, just like with vectors, we can subset with vectors if we want:

[7]:

our_matrix[1:2, 3:4]

A matrix: 2 × 2 of type int
7	10
8	11

Subsetting with Logicals¶

Subsetting with logical vectors also generalizes from vectors to matrices in the same way. To illustrate, let’s go back to our toy matrix of survey respondenses:

[8]:

income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)

survey <- cbind(income, age, education)
survey

A matrix: 3 × 3 of type dbl
income	age	education
22000	20	12
75000	35	16
19000	55	11

If we wanted to select all the rows where income was less than the US median income (about 65,000), we would first extract the income column, then create a logical column that’s TRUE if income is below 65,000, then put that in the first position of our square brackets:

[9]:

income <- survey[, 1]
income

22000
75000
19000

[10]:

below_median <- income < 65000
below_median

TRUE
FALSE
TRUE

[11]:

survey[below_median, ]

A matrix: 2 × 3 of type dbl
income	age	education
22000	20	12
19000	55	11

Or, of course, we could do that all in one line instead of breaking out the steps:

[12]:

survey[survey[, 1] < 65000, ]

A matrix: 2 × 3 of type dbl
income	age	education
22000	20	12
19000	55	11

Or, since R used the names of vectors we passed to cbind() as column names, we could also subset our columns by name, which makes our code a lot easier to understand:

[13]:

survey[survey[, "income"] < 65000, ]

A matrix: 2 × 3 of type dbl
income	age	education
22000	20	12
19000	55	11

Subsetting by Names¶

As we just saw, while not all matrices have names, if you they do you can subset using them. For example, our survey matrix has column names, but no row names, so we can only subset columns by name:

[14]:

survey[, "education"]

12
16
11

Names are accessible through the colnames() and rownames() functions:

[15]:

colnames(survey)

'income'
'age'
'education'

[16]:

rownames(survey)

NULL

Oddly, R also allows you to assign to these functions to change the names on a matrix. For example, to add row names we could do:

[17]:

rownames(survey) <- c("row1", "row2", "row3")
survey

A matrix: 3 × 3 of type dbl
	income	age	education
row1	22000	20	12
row2	75000	35	16
row3	19000	55	11

And we can delete them too!

[18]:

rownames(survey) <- NULL

Subsetting by Row and Column Simultaneously¶

Often, we don’t just want to subset rows or columns, but both at once. For example, suppose I wanted the education levels of everyone with incomes below the US median. I could do this in two steps by subsetting rows and then subsetting columns:

[19]:

below_median <- survey[survey[, "income"] < 65000, ]
below_median[, "education"]

12
11

Or I can do it all in one command!

[20]:

survey[survey[,"income"] < 65000, "education"]

12
11

So what is the average education of people earning less than the median income in the US in our toy data?

[21]:

mean(survey[survey[, "income"] < 65000, "education"])

11.5

OK – I know we’ve just covered a lot, but hopefully that example makes clear how quickly we can start doing really, really powerful analyses and answering substantive questions just by subsetting our data carefully.

Using Subsets to Modify Data¶

Sometimes we want to modify a part of a matrix. For example, suppose we were working with our survey data, and we want to multiple all the income values by 1.02 to adjust for inflation that has occurred since the survey. Obviously, if we just multiplied the matrix by 1.02, we’d also modify things like education and age:

[22]:

survey * 1.02

A matrix: 3 × 3 of type dbl
income	age	education
22440	20.4	12.24
76500	35.7	16.32
19380	56.1	11.22

What we can do instead is extract the column with income, modify it, then replace the old income column with our updated column:

[23]:

income_column <- survey[, "income"] # Extract income
adjusted_income <- income_column * 1.02 # Adjust income
survey[, "income"] <- adjusted_income # Replace income with new values!
survey

A matrix: 3 × 3 of type dbl
income	age	education
22440	20	12
76500	35	16
19380	55	11

Or, if we wanted, we could actually do all this in one step:

[24]:

# Re-make survey so it hasn't been adjusted for inflation
income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)
survey <- cbind(income, age, education)
survey

A matrix: 3 × 3 of type dbl
income	age	education
22000	20	12
75000	35	16
19000	55	11

[25]:

# Now adjust income in one step!
survey[, "income"] <- survey[, "income"] * 1.02
survey

A matrix: 3 × 3 of type dbl
income	age	education
22440	20	12
76500	35	16
19380	55	11

And this is especially powerful if we subset on BOTH rows and columns. Suppose, for example, we wanted to see what people’s incomes would look like if anyone who didn’t finish high school (education < 12) got a tax credit of 10,000 dollars?

[26]:

survey[survey[, "education"] < 12, "income"] = survey[survey[, "education"] < 12, "income"] + 10000

[27]:

survey

A matrix: 3 × 3 of type dbl
income	age	education
22440	20	12
76500	35	16
29380	55	11

And that’s it! Now you’re a matrix pro.

Recap¶

Subsetting matrices is just like subsetting vectors, except with two entries between the square brackets: [ , ].
The first entry in the square brackets relates to rows, the second to columns.
Like vectors, you can subset by index, by logical vector, or by name.
You can mix how you subset, and use a logical for rows and a name for columns.
If you subset a row, or subset a column you get back a vector.
Subsetting on both rows and column allows you to edit matrices in very powerful ways.

Exercises¶

Now that we’ve familiarized ourselves with matrices and matrix manipulation, it’s time to do some exercises!