Manipulating Matrices¶
The great thing about matrices is that since they are just generalizations of vectors from one dimension to two, subsetting matrices works almost the same way it works with vectors. Basically, instead of subsetting by passing an index or a logical array into a set of square brackets (e.g. [1]
), we just put a comma in those square brackets and specify a location with two indices / logical arrays (e.g. [1,1]
).
Subsetting by Index¶
Suppose we have the following matrix:
[1]:
our_matrix <- matrix(1:12, nrow = 3, ncol = 4)
our_matrix
1 | 4 | 7 | 10 |
2 | 5 | 8 | 11 |
3 | 6 | 9 | 12 |
To subset, we just pass a location along the x-axis (rows) and a location along the y-axis (columns). For example, if we wanted the entry from the second row and third column, we’d type:
[2]:
our_matrix[2, 3]
The one new thing is that if you want ALL entries along a specific dimension, you still put in a comma, but you leave the entry blank for the dimension on which you want all observations. So if I wanted to second row, I’d just type:
[3]:
our_matrix[2, ]
- 2
- 5
- 8
- 11
Or if I wanted the third column, I’d type:
[4]:
our_matrix[, 3]
- 7
- 8
- 9
Note that if you pull out a subset of your matrix that is one dimensional, it just becomes a vector!
[5]:
class(our_matrix)
- 'matrix'
- 'array'
[6]:
class(our_matrix[1, ])
Finally, just like with vectors, we can subset with vectors if we want:
[7]:
our_matrix[1:2, 3:4]
7 | 10 |
8 | 11 |
Subsetting with Logicals¶
Subsetting with logical vectors also generalizes from vectors to matrices in the same way. To illustrate, let’s go back to our toy matrix of survey respondenses:
[8]:
income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)
survey <- cbind(income, age, education)
survey
income | age | education |
---|---|---|
22000 | 20 | 12 |
75000 | 35 | 16 |
19000 | 55 | 11 |
If we wanted to select all the rows where income was less than the US median income (about 65,000), we would first extract the income column, then create a logical column that’s TRUE
if income is below 65,000, then put that in the first position of our square brackets:
[9]:
income <- survey[, 1]
income
- 22000
- 75000
- 19000
[10]:
below_median <- income < 65000
below_median
- TRUE
- FALSE
- TRUE
[11]:
survey[below_median, ]
income | age | education |
---|---|---|
22000 | 20 | 12 |
19000 | 55 | 11 |
Or, of course, we could do that all in one line instead of breaking out the steps:
[12]:
survey[survey[, 1] < 65000, ]
income | age | education |
---|---|---|
22000 | 20 | 12 |
19000 | 55 | 11 |
Or, since R used the names of vectors we passed to cbind()
as column names, we could also subset our columns by name, which makes our code a lot easier to understand:
[13]:
survey[survey[, "income"] < 65000, ]
income | age | education |
---|---|---|
22000 | 20 | 12 |
19000 | 55 | 11 |
Subsetting by Names¶
As we just saw, while not all matrices have names, if you they do you can subset using them. For example, our survey
matrix has column names, but no row names, so we can only subset columns by name:
[14]:
survey[, "education"]
- 12
- 16
- 11
Names are accessible through the colnames()
and rownames()
functions:
[15]:
colnames(survey)
- 'income'
- 'age'
- 'education'
[16]:
rownames(survey)
NULL
Oddly, R also allows you to assign to these functions to change the names on a matrix. For example, to add row names we could do:
[17]:
rownames(survey) <- c("row1", "row2", "row3")
survey
income | age | education | |
---|---|---|---|
row1 | 22000 | 20 | 12 |
row2 | 75000 | 35 | 16 |
row3 | 19000 | 55 | 11 |
And we can delete them too!
[18]:
rownames(survey) <- NULL
Subsetting by Row and Column Simultaneously¶
Often, we don’t just want to subset rows or columns, but both at once. For example, suppose I wanted the education levels of everyone with incomes below the US median. I could do this in two steps by subsetting rows and then subsetting columns:
[19]:
below_median <- survey[survey[, "income"] < 65000, ]
below_median[, "education"]
- 12
- 11
Or I can do it all in one command!
[20]:
survey[survey[,"income"] < 65000, "education"]
- 12
- 11
So what is the average education of people earning less than the median income in the US in our toy data?
[21]:
mean(survey[survey[, "income"] < 65000, "education"])
OK – I know we’ve just covered a lot, but hopefully that example makes clear how quickly we can start doing really, really powerful analyses and answering substantive questions just by subsetting our data carefully.
Using Subsets to Modify Data¶
Sometimes we want to modify a part of a matrix. For example, suppose we were working with our survey data, and we want to multiple all the income values by 1.02
to adjust for inflation that has occurred since the survey. Obviously, if we just multiplied the matrix by 1.02
, we’d also modify things like education and age:
[22]:
survey * 1.02
income | age | education |
---|---|---|
22440 | 20.4 | 12.24 |
76500 | 35.7 | 16.32 |
19380 | 56.1 | 11.22 |
What we can do instead is extract the column with income, modify it, then replace the old income column with our updated column:
[23]:
income_column <- survey[, "income"] # Extract income
adjusted_income <- income_column * 1.02 # Adjust income
survey[, "income"] <- adjusted_income # Replace income with new values!
survey
income | age | education |
---|---|---|
22440 | 20 | 12 |
76500 | 35 | 16 |
19380 | 55 | 11 |
Or, if we wanted, we could actually do all this in one step:
[24]:
# Re-make survey so it hasn't been adjusted for inflation
income <- c(22000, 75000, 19000)
age <- c(20, 35, 55)
education <- c(12, 16, 11)
survey <- cbind(income, age, education)
survey
income | age | education |
---|---|---|
22000 | 20 | 12 |
75000 | 35 | 16 |
19000 | 55 | 11 |
[25]:
# Now adjust income in one step!
survey[, "income"] <- survey[, "income"] * 1.02
survey
income | age | education |
---|---|---|
22440 | 20 | 12 |
76500 | 35 | 16 |
19380 | 55 | 11 |
And this is especially powerful if we subset on BOTH rows and columns. Suppose, for example, we wanted to see what people’s incomes would look like if anyone who didn’t finish high school (education < 12
) got a tax credit of 10,000 dollars?
[26]:
survey[survey[, "education"] < 12, "income"] = survey[survey[, "education"] < 12, "income"] + 10000
[27]:
survey
income | age | education |
---|---|---|
22440 | 20 | 12 |
76500 | 35 | 16 |
29380 | 55 | 11 |
And that’s it! Now you’re a matrix pro.
Recap¶
Subsetting matrices is just like subsetting vectors, except with two entries between the square brackets:
[ , ]
.The first entry in the square brackets relates to rows, the second to columns.
Like vectors, you can subset by index, by logical vector, or by name.
You can mix how you subset, and use a logical for rows and a name for columns.
If you subset a row, or subset a column you get back a vector.
Subsetting on both rows and column allows you to edit matrices in very powerful ways.
Exercises¶
Now that we’ve familiarized ourselves with matrices and matrix manipulation, it’s time to do some exercises!