Intro to Data Types¶
One of the features of computers than can be frustrating to new programmers is their inflexibility, and one place this inflexibility is most evident is in the fact that programs like R store data as discrete types. In this section, we’ll go over the four data types you’ll encounter most – numeric
, integer
, character
, and logical
types – as well as what it means for R to have these distinct data types.
Types and Their Uses¶
The four types of data you’ll encounter most in R are numeric
, integer
, character
, and logical
, and each has an important role to play in for the social science researcher.
numeric
data, as the name implies, is data that stores numbers. This may be people’s ages or incomes, countries’ GDPs and infant mortality rates, global temperatures, or survey responses on a scale from 1 to 7. numeric
data supports mathematical operations, like multiplying, dividing, adding and subtracting, etc.
integer
data is just a special kind of numeric data that only contains, well… integers! (e.g. 1, 2, 3). We won’t worry much about integer
as a special type in this bootcamp – if something can be represented as an integer
it can be represented as a numeric
– but sometimes you’ll see it, so it’s good to be aware of.
character
data is text data. It could be something short, like a survey respondent’s name, or something longer, like the content of a tweet or a politician’s speech.
logical
data takes on only two values: TRUE
and FALSE
(note those values have to be written in all capitals to be recognized by R!). logical
data can store information about the world (you could have a variable called female
that only has values of TRUE
and FALSE
), but more often we use it for evaluating our data. For example, suppose we wanted to test whether a survey respondent’s age is over 18 to evaluate whether they are eligible to vote – we’d probably do something like ask whether age >= 18
, which would evaluate to TRUE
or FALSE
.
Working with Data Types¶
Unlike some programming languages, when you assign values to a variable, you don’t have to tell R the type of data you’re assigning in advance (this is definitely not true for all languages, and something that makes R much easier to use than many other languages!). Instead, R will make inferences based on what you’ve passed. Namely:
If you type out a number, R will assume it’s
numeric
.(If you’ve worked in other programming languages, you’ve probably heard these referred to as
floats
.)
If you type out something and put it in double-quotation marks (
"like this"
), R will treat it as acharacter
.(If you’ve worked in other programming languages, you’ve probably heard these referred to as
strings
.)
If R sees
TRUE
orFALSE
, or an expression that is evaluated toTRUE
orFALSE
(like7 > 3
, which is obvouslyTRUE
) it will treat it as logical.(If you’ve worked in other programming languages, you’ve probably heard these referred to as
bools
orbooleans
.)
To illustrate, let’s play around a little. Note that we can always check the type of data by passing our data, or the variable to which the data has been assigned, to the class()
function, which then returns the type of the input. For example:
[1]:
pi <- 3.1416
class(pi)
[2]:
mystery_novel <- "T'was a dark and stormy night"
class(mystery_novel)
[3]:
my_logical <- 7 < 3
class(my_logical)
It’s worth emphasizing here that putting a variable into a function is exactly the same as putting the value assigned to that variable into the function. This is, indeed, one of those core ideas about how most programming languages work: a variable is just a stand-in for the value that has been assigned for it, and R will treat them interchangably. e.g.:
[4]:
# Evaluating the variable pi
pi <- 3.1416
class(pi)
[5]:
# Has the same effect as just putting
# in the value directly!
class(3.1416)
Operations and Data Types¶
Data types aren’t just about helping R remember what kind of data has been assigned to a variable – it also affects how some operators (like +
) are interpreted.
For example, if I put +
between two numeric variables, R will do the obvious thing and add them up:
[6]:
a <- 10
b <- 2
a + b
But if I try and put a +
between two character variables, R will stop and say “WAIT A MINUTE! I don’t know how to add two characters!”
a <- "Lyra"
b <- "Belacqua"
a + b
> Error in a + b : non-numeric argument to binary operator
Note that one place that this behavior can be confusing is when R has stored numbers as characters – something that happens a lot when you are importing a file. That’s because, as I said before, computers are really inflexible. For example, suppose you had the following code:
a <- "5"
b <- "4"
a + b
Now, if I asked you what that should print out, I’m sure you’d say “9”, because you’re smart and can recognize what I want. But R can’t do that – it sees you trying to add two character variables and says “nope, sorry! Can’t do that.”
a <- "5"
b <- "4"
a + b
> Error in a + b : non-numeric argument to binary operator
Which brings us to…
Converting Data Types¶
From time to time, you’ll want to move between data types, and for that we have a couple special functions, all with the same naming structure: as.numeric()
, as.character()
, and as.logical()
. Each of these will take a variable and try to convert data to the type named, and it if can’t do it, it will throw up an error.
So let’s do our example above again using as.numeric()
:
[7]:
a <- "5"
b <- "4"
a <- as.numeric(a)
b <- as.numeric(b)
a + b
Ta-da!
(See how I assigned the return value of as.numeric(a)
to the variable a
, overwriting the old value? Remember you have to assign those return values if you want them remembered!)
But if I were to try and convert a character like "Ford Prefect"
to a numeric, it would give me a weird warning telling me it couldn’t do it:
as.numeric("Ford Prefect")
> Warning message in eval(expr, envir, enclos):
> "NAs introduced by coercion"
<NA>
(You will also notice that it returns something called an NA
. We’ll talk more about NA
s in a later lesson, but for the moment it’s efficient to know that NA
is what R returns when it gets stuck. :))
More on Logical¶
Let’s talk about logical data, as their usefulness is perhaps least evident. Logical data are important because of how often we want to evaluate logical statements in data analysis, and the results of those need to take the form of TRUE
and FALSE
.
For example:
[8]:
# Simple math tests, like inequalities
7 > 5
[9]:
-1 >= 10
[10]:
# Or tests of our data:
age <- 17 # Some fake data
age > 18
The other place logical data types come up a lot are when we want to test whether two things are the same (equal). Because we can use =
to assign values to variables (see note on assignment operators here), we can’t test whether two things are equal by typing the obvious a = 5
– if that were allowed, R wouldn’t know if we were assigning 5 to a
, or asking whether the value already assigned to a
is equal to 5.
So to evaluate whether two things are equal, we use a double-equal sign (==
). For example:
[11]:
a <- 5
b <- 5
a == b
[12]:
c <- 7
a == c
And we can also use the !=
to test if two things are not equal:
[13]:
5 != 5
[14]:
a != c
Finally, there’s one oddity of logical
data worth mentioning, which is that if you try and convert a logical
into a numeric
, then FALSE
will turn into 0
and TRUE
will turn into 1
.
[15]:
as.numeric(TRUE)
[16]:
as.numeric(FALSE)
Why? Well… it’s a long story. :) But this isn’t just an R thing, this is true in almost all programming languages.
Recap¶
Data in R always has a type.
numeric
stores any number.integer
only represents integers (-2, -1, 0, 1, 2…), and we won’t use them much.character
stores text datalogical
can only be TRUE or FALSEWhen it makes sense, data can be converted with functions like
as.numeric()
oras.character()
.If no meaningful conversion makes sense, those functions will give you a warning.
For weird reasons,
as.numeric(TRUE)
is1
, andas.numeric(FALSE)
is0
.
Exercises for Now¶
Want to practice these skills? Head on over to this site to find some exercises you can do right now!
Exercises for Class¶
Here are some exercises we’ll be doing in our synchronous class. If you are enrolled in our synchronous sessions, please do not do these before class! If you’re reading these materials on your own or are enrolled in our asynchronous class, feel free to take a look now.