Lists

Up till now, we’ve mostly been focused on vectors and matrices in different forms. Arrays like vectors and matrices are extremely important in data science, in part because constraining a collection of values to all be of the same time allows R to be much faster than it would be otherwise.

But sometimes we need a more flexible data structure, and for that R has lists!

In terms of how you work with them, lists are very similar to vectors – the items in the list are ordered along one dimension so you can subset them with indices just like vectors, and you can also subset using logicals or by name. But list have two advantages over vectors:

  • The entries in a list don’t all have to be the same type, and

  • You can put anything in a list, including big data structures like dataframes or even other lists!

Creating and Adding to Lists

To create a list, just use list() instead of c():

[1]:
# Create a list with characters, a numeric,
# a logical, a longer numeric, end even another list!
my_list <- list("one", 2, TRUE, c(1, 2, 3), list(1, 2, 3))
my_list
  1. 'one'
  2. 2
  3. TRUE
    1. 1
    2. 2
    3. 3
    1. 1
    2. 2
    3. 3

And you can append things to the end of lists:

[2]:
my_list <- append(my_list, "five")

Subsetting Lists

Lists have a concept of order, so you can modify things by index, just like a vector:

[3]:
# Change the third entry
my_list[3] <- "three"
my_list
  1. 'one'
  2. 2
  3. 'three'
    1. 1
    2. 2
    3. 3
    1. 1
    2. 2
    3. 3
  4. 'five'

And subsetting with logicals also works the same way.

However, there is one odd feature of sub setting with lists: if you access them with single [] brackets, you get back a list with the things you wanted, not the things themselves. To just get the entry in the list itself, you have to use double [] brackets:

[4]:
x <- my_list[1]
x
  1. 'one'
[5]:
class(x)
'list'
[6]:
y <- my_list[[1]]
y
'one'
[7]:
class(y)
'character'

Names and Lists

List entries can also be named, and you can can get named entries out of list the same way we would with a vector.

[8]:
my_list <- list(first = "one", second = "two")
my_list
$first
'one'
$second
'two'
[9]:
my_list["first"]
$first = 'one'

However, named entries analyst can also be accessed with the $ operator:

[10]:
my_list$first
'one'

Why Don’t We Always Use Lists?

Because they are slow! The same flexibility that makes less convenient also makes it hard for R to work with them quickly, so you only want to use them for caring around small collections of things, not for data sets.

  • Lists are great for small collections of heterogeneous data – they provide great flexibility.

  • Lists are really valuable when you want to put complicated data structures into a collection – for example if you want to carry around a couple matrices or a couple dataframes.

  • The fact that lists can hold other lists means that they are an example of a recursive data structure