Intro to Data Types

One of the features of computers than can be frustrating to new programmers is their inflexibility, and one place this inflexibility is most evident is in the fact that programs like R store data as discrete types. In this section, we’ll go over the four data types you’ll encounter most – numeric, integer, character, and logical types – as well as what it means for R to have these distinct data types.

Types and Their Uses

The four types of data you’ll encounter most in R are numeric, integer, character, and logical, and each has an important role to play in for the social science researcher.

numeric data, as the name implies, is data that stores numbers. This may be people’s ages or incomes, countries’ GDPs and infant mortality rates, global temperatures, or survey responses on a scale from 1 to 7. numeric data supports mathematical operations, like multiplying, dividing, adding and subtracting, etc.

integer data is just a special kind of numeric data that only contains, well… integers! (e.g. 1, 2, 3). We won’t worry much about integer as a special type in this bootcamp – if something can be represented as an integer it can be represented as a numeric – but sometimes you’ll see it, so it’s good to be aware of.

character data is text data. It could be something short, like a survey respondent’s name, or something longer, like the content of a tweet or a politician’s speech.

logical data takes on only two values: TRUE and FALSE (note those values have to be written in all capitals to be recognized by R!). logical data can store information about the world (you could have a variable called female that only has values of TRUE and FALSE), but more often we use it for evaluating our data. For example, suppose we wanted to test whether a survey respondent’s age is over 18 to evaluate whether they are eligible to vote – we’d probably do something like ask whether age >= 18, which would evaluate to TRUE or FALSE.

Working with Data Types

Unlike some programming languages, when you assign values to a variable, you don’t have to tell R the type of data you’re assigning in advance (this is definitely not true for all languages, and something that makes R much easier to use than many other languages!). Instead, R will make inferences based on what you’ve passed. Namely:

  • If you type out a number, R will assume it’s numeric.

    • (If you’ve worked in other programming languages, you’ve probably heard these referred to as floats.)

  • If you type out something and put it in double-quotation marks ("like this"), R will treat it as a character.

    • (If you’ve worked in other programming languages, you’ve probably heard these referred to as strings.)

  • If R sees TRUE or FALSE, or an expression that is evaluated to TRUE or FALSE (like 7 > 3, which is obvously TRUE) it will treat it as logical.

    • (If you’ve worked in other programming languages, you’ve probably heard these referred to as bools or booleans.)

To illustrate, let’s play around a little. Note that we can always check the type of data by passing our data, or the variable to which the data has been assigned, to the class() function, which then returns the type of the input. For example:

[1]:
pi <- 3.1416
class(pi)
'numeric'
[2]:
mystery_novel <- "T'was a dark and stormy night"
class(mystery_novel)
'character'
[3]:
my_logical <- 7 < 3
class(my_logical)
'logical'

It’s worth emphasizing here that putting a variable into a function is exactly the same as putting the value assigned to that variable into the function. This is, indeed, one of those core ideas about how most programming languages work: a variable is just a stand-in for the value that has been assigned for it, and R will treat them interchangably. e.g.:

[4]:
# Evaluating the variable pi
pi <- 3.1416
class(pi)
'numeric'
[5]:
# Has the same effect as just putting
# in the value directly!
class(3.1416)
'numeric'

Operations and Data Types

Data types aren’t just about helping R remember what kind of data has been assigned to a variable – it also affects how some operators (like +) are interpreted.

For example, if I put + between two numeric variables, R will do the obvious thing and add them up:

[6]:
a <- 10
b <- 2
a + b
12

But if I try and put a + between two character variables, R will stop and say “WAIT A MINUTE! I don’t know how to add two characters!”

a <- "Lyra"
b <- "Belacqua"
a + b

> Error in a + b : non-numeric argument to binary operator

Note that one place that this behavior can be confusing is when R has stored numbers as characters – something that happens a lot when you are importing a file. That’s because, as I said before, computers are really inflexible. For example, suppose you had the following code:

a <- "5"
b <- "4"
a + b

Now, if I asked you what that should print out, I’m sure you’d say “9”, because you’re smart and can recognize what I want. But R can’t do that – it sees you trying to add two character variables and says “nope, sorry! Can’t do that.”

a <- "5"
b <- "4"
a + b

> Error in a + b : non-numeric argument to binary operator

Which brings us to…

Converting Data Types

From time to time, you’ll want to move between data types, and for that we have a couple special functions, all with the same naming structure: as.numeric(), as.character(), and as.logical(). Each of these will take a variable and try to convert data to the type named, and it if can’t do it, it will throw up an error.

So let’s do our example above again using as.numeric():

[7]:
a <- "5"
b <- "4"
a <- as.numeric(a)
b <- as.numeric(b)
a + b
9

Ta-da!

(See how I assigned the return value of as.numeric(a) to the variable a, overwriting the old value? Remember you have to assign those return values if you want them remembered!)

But if I were to try and convert a character like "Ford Prefect" to a numeric, it would give me a weird warning telling me it couldn’t do it:

as.numeric("Ford Prefect")

> Warning message in eval(expr, envir, enclos):
> "NAs introduced by coercion"
<NA>

(You will also notice that it returns something called an NA. We’ll talk more about NAs in a later lesson, but for the moment it’s efficient to know that NA is what R returns when it gets stuck. :))

More on Logical

Let’s talk about logical data, as their usefulness is perhaps least evident. Logical data are important because of how often we want to evaluate logical statements in data analysis, and the results of those need to take the form of TRUE and FALSE.

For example:

[8]:
# Simple math tests, like inequalities
7 > 5
TRUE
[9]:
-1 >= 10
FALSE
[10]:
# Or tests of our data:
age <- 17   # Some fake data
age > 18
FALSE

The other place logical data types come up a lot are when we want to test whether two things are the same (equal). Because we can use = to assign values to variables (see note on assignment operators here), we can’t test whether two things are equal by typing the obvious a = 5 – if that were allowed, R wouldn’t know if we were assigning 5 to a, or asking whether the value already assigned to a is equal to 5.

So to evaluate whether two things are equal, we use a double-equal sign (==). For example:

[11]:
a <- 5
b <- 5
a == b
TRUE
[12]:
c <- 7
a == c
FALSE

And we can also use the != to test if two things are not equal:

[13]:
5 != 5
FALSE
[14]:
a != c
TRUE

Finally, there’s one oddity of logical data worth mentioning, which is that if you try and convert a logical into a numeric, then FALSE will turn into 0 and TRUE will turn into 1.

[15]:
as.numeric(TRUE)
1
[16]:
as.numeric(FALSE)
0

Why? Well… it’s a long story. :) But this isn’t just an R thing, this is true in almost all programming languages.

Recap

  • Data in R always has a type.

  • numeric stores any number.

  • integer only represents integers (-2, -1, 0, 1, 2…), and we won’t use them much.

  • character stores text data

  • logical can only be TRUE or FALSE

  • When it makes sense, data can be converted with functions like as.numeric() or as.character().

  • If no meaningful conversion makes sense, those functions will give you a warning.

  • For weird reasons, as.numeric(TRUE) is 1, and as.numeric(FALSE) is 0.

Exercises for Now

Want to practice these skills? Head on over to this site to find some exercises you can do right now!

Exercises for Class

Here are some exercises we’ll be doing in our synchronous class. If you are enrolled in our synchronous sessions, please do not do these before class! If you’re reading these materials on your own or are enrolled in our asynchronous class, feel free to take a look now.