Manipulating Matrices

The great thing about matrices is that since they are just generalizations of vectors from one dimension to two, subsetting matrices works almost the same way it works with vectors. Basically, instead of subsetting by passing an index or a logical array into a set of square brackets (e.g. [1]), we just put a comma in those square brackets and specify a location with two indices / logical arrays (e.g. [1,1]).

Subsetting by Index

Suppose we have the following matrix:

[1]:
our_matrix <- matrix(1:12, nrow = 3, ncol = 4)
our_matrix
A matrix: 3 × 4 of type int
14710
25811
36912

To subset, we just pass a location along the x-axis (rows) and a location along the y-axis (columns). For example, if we wanted the entry from the second row and third column, we’d type:

[2]:
our_matrix[2, 3]
8

The one new thing is that if you want ALL entries along a specific dimension, you still put in a comma, but you leave the entry blank for the dimension on which you want all observations. So if I wanted to second row, I’d just type:

[3]:
our_matrix[2, ]
  1. 2
  2. 5
  3. 8
  4. 11

Or if I wanted the third column, I’d type:

[4]:
our_matrix[, 3]
  1. 7
  2. 8
  3. 9

Note that if you pull out a subset of your matrix that is one dimensional, it just becomes a vector!

[5]:
class(our_matrix)
  1. 'matrix'
  2. 'array'
[6]:
class(our_matrix[1, ])
'integer'

Finally, just like with vectors, we can subset with vectors if we want:

[7]:
our_matrix[1:2, 3:4]
A matrix: 2 × 2 of type int
710
811

Subsetting with Logicals

Subsetting with logical vectors also generalizes from vectors to matrices in the same way. To illustrate, let’s go back to our toy matrix of survey respondenses:

[8]:
income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)

survey <- cbind(income, age, education)
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
220002012
750003516
190005511

If we wanted to select all the rows where income was less than the US median income (about 65,000), we would first extract the income column, then create a logical column that’s TRUE if income is below 65,000, then put that in the first position of our square brackets:

[9]:
income <- survey[, 1]
income
  1. 22000
  2. 75000
  3. 19000
[10]:
below_median <- income < 65000
below_median
  1. TRUE
  2. FALSE
  3. TRUE
[11]:
survey[below_median, ]
A matrix: 2 × 3 of type dbl
incomeageeducation
220002012
190005511

Or, of course, we could do that all in one line instead of breaking out the steps:

[12]:
survey[survey[, 1] < 65000, ]
A matrix: 2 × 3 of type dbl
incomeageeducation
220002012
190005511

Or, since R used the names of vectors we passed to cbind() as column names, we could also subset our columns by name, which makes our code a lot easier to understand:

[13]:
survey[survey[, "income"] < 65000, ]
A matrix: 2 × 3 of type dbl
incomeageeducation
220002012
190005511

Subsetting by Names

As we just saw, while not all matrices have names, if you they do you can subset using them. For example, our survey matrix has column names, but no row names, so we can only subset columns by name:

[14]:
survey[, "education"]
  1. 12
  2. 16
  3. 11

Names are accessible through the colnames() and rownames() functions:

[15]:
colnames(survey)
  1. 'income'
  2. 'age'
  3. 'education'
[16]:
rownames(survey)
NULL

Oddly, R also allows you to assign to these functions to change the names on a matrix. For example, to add row names we could do:

[17]:
rownames(survey) <- c("row1", "row2", "row3")
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
row1220002012
row2750003516
row3190005511

And we can delete them too!

[18]:
rownames(survey) <- NULL

Subsetting by Row and Column Simultaneously

Often, we don’t just want to subset rows or columns, but both at once. For example, suppose I wanted the education levels of everyone with incomes below the US median. I could do this in two steps by subsetting rows and then subsetting columns:

[19]:
below_median <- survey[survey[, "income"] < 65000, ]
below_median[, "education"]
  1. 12
  2. 11

Or I can do it all in one command!

[20]:
survey[survey[,"income"] < 65000, "education"]
  1. 12
  2. 11

So what is the average education of people earning less than the median income in the US in our toy data?

[21]:
mean(survey[survey[, "income"] < 65000, "education"])
11.5

OK – I know we’ve just covered a lot, but hopefully that example makes clear how quickly we can start doing really, really powerful analyses and answering substantive questions just by subsetting our data carefully.

Using Subsets to Modify Data

Sometimes we want to modify a part of a matrix. For example, suppose we were working with our survey data, and we want to multiple all the income values by 1.02 to adjust for inflation that has occurred since the survey. Obviously, if we just multiplied the matrix by 1.02, we’d also modify things like education and age:

[22]:
survey * 1.02
A matrix: 3 × 3 of type dbl
incomeageeducation
2244020.412.24
7650035.716.32
1938056.111.22

What we can do instead is extract the column with income, modify it, then replace the old income column with our updated column:

[23]:
income_column <- survey[, "income"] # Extract income
adjusted_income <- income_column * 1.02 # Adjust income
survey[, "income"] <- adjusted_income # Replace income with new values!
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
224402012
765003516
193805511

Or, if we wanted, we could actually do all this in one step:

[24]:
# Re-make survey so it hasn't been adjusted for inflation
income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)
survey <- cbind(income, age, education)
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
220002012
750003516
190005511
[25]:
# Now adjust income in one step!
survey[, "income"] <- survey[, "income"] * 1.02
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
224402012
765003516
193805511

And this is especially powerful if we subset on BOTH rows and columns. Suppose, for example, we wanted to see what people’s incomes would look like if anyone who didn’t finish high school (education < 12) got a tax credit of 10,000 dollars?

[26]:
survey[survey[, "education"] < 12, "income"] = survey[survey[, "education"] < 12, "income"] + 10000
[27]:
survey
A matrix: 3 × 3 of type dbl
incomeageeducation
224402012
765003516
293805511

And that’s it! Now you’re a matrix pro.

Recap

  • Subsetting matrices is just like subsetting vectors, except with two entries between the square brackets: [ , ].

  • The first entry in the square brackets relates to rows, the second to columns.

  • Like vectors, you can subset by index, by logical vector, or by name.

  • You can mix how you subset, and use a logical for rows and a name for columns.

  • If you subset a row, or subset a column you get back a vector.

  • Subsetting on both rows and column allows you to edit matrices in very powerful ways.

Exercises

Now that we’ve familiarized ourselves with matrices and matrix manipulation, it’s time to do some exercises!