2.4 Try out the console
2.4.1 Create your first object
Remember when we typed in 7 + 52? Perhaps next we’d like to first see the total of those two numbers and then calculate the average. We could first type
and then
But if we want to do more than one thing with the same data, it’s best to store that data in a variable.
A variable is basically a container that stores some sort of value or values. In Excel, if you used a formula such as =A1 + B1
, you used the variable A1 to mean “the value that’s in cell A1” and B1 as “the value that’s in cell B1.” If the value in cell A1 changes, so will the value of =A1 + B1
. In R (or any programming language), you can set a variable for a lot of different types of values. I can store the value 7 in a variable called num1 and store the value 52 in a variable called num2 like this:
That <-
is R’s “assignment operator.” It just means num1 is assigned the numerical value 7, and num2 is assigned the numerical value 52. Most programming languages would use the equals sign, like num1 = 7
. In fact, that will also work in R, but, well, it’s not what the R cool kids do. Most R code you’ll come across will use <-
. Google’s R style guide says use <-
. So does Hadley Wickham, arguably the most well-known creator of packages in the R community. There are a couple of technical reasons for preferring <-
to =
, but for now, I hope you’ll trust me that it’s worth using.
Meanwhile, back to our variables.
Open up a new R script file in your RStudio test project by going to File > New File > R Script. Type
in the RStudio top pane with your new script file, and then save it. You can call it something like testscript.R – the .R file extension tells R and RStudio that it’s an R script. You can save a file within RStudio by clicking on the disk icon, going to File > Save, or using the keyboard shortcut ctrl-S on Windows or cmd-S on Mac). If you forget to add .R to your file name, RStudio will add it for you, since you indicated when creating the file that it was an R script.
Now, you need to run your code. There are a couple of ways to run a script file within RStudio. To run the entire file, click on the Source button, not the Run button. The Run button only runs one line of code, whatever line your cursor is on. Ctrl-Enter on Windows and cmd-Enter on Mac will also run the single line of code your cursor is on (if a single command spans multiple lines, the full multi-line command will run).
To run several lines of code from within the RStudio script pane, highlight them (with the usual click-and-drag) and hit control-enter (command-enter on Mac).
If you’d like to re-run your entire script file every time you save it, click the “Source on Save” check box.
When you run the code, you should see num1 and num2 under “Values” in the Environment tab of the top right pane. That shows the variables exist and are available for use. If you type num1
into the interactive console (bottom left pane) and hit enter, you’ll get back the value of that variable, which in this case is 7.
If you type num slowly into the console and stop, you should notice that RStudio gives you several options: num1, num2, numeric (a base R function, meaning it’s part of the core language) and a few more. This autocomplete can be quite helpful if you have a long variable name and want to make sure not to mis-type it. You can use your up and down arrows to select among RStudio’s visible auto-complete suggestions and then hit tab to accept one. Enter works as well; hit Enter again if you’re in the console and want to run it.
Now, instead of typing 7 + 52
to get the total, you can type
## [1] 59
in the script file and run that line with ctrl/cmd-Enter.
The first line is what I typed into the console; the second line is my result. Any line beginning with ##
means that’s what R returned. so it’s not something you should type. The [1] on that results line means that is the first item of my results – not really needed when I’ve got one item, but helpful if there are 30 items over 5 or 10 lines.
I know that adding two numbers together isn’t very exciting. But variables get much more useful once you have more data.
You’ll often want to use variables to store results of your commands. Here, you can save the result of num1 + num2 in a variable called total1.
To see the value of the total1
variable, either look at the environment tab in RStudio’s top right pane, or type the variable name total1
into the console.
A few important points about R variables:
- R is case sensitive, so total1 is a different variable from Total1 and TOTAL1.
- Variable names can start with a letter or a period (if a period, the next character can’t be a number) and can then contain letters, numbers, periods, and underscores.
my_total
is a valid variable name but_mytotal
is not. - Variables can hold different types of data, not just a single number. A variable can hold many numbers at once, a character string, and other things (this book will cover several of the most useful data types, but not all of them). Some programming languages are pretty strict about data types, but R isn’t; it’s quite easy to change the type of data a variable holds, and it’s not necessary to specify a data type in advance.
- There are some reserved words that can’t be used as variable names, and you probably don’t want to use existing function names as variables even when allowed, since that can screw up your access to a function you may need later. I often use names like
mytotal
ormyaverage
as function names in my own work to make sure I’m not stepping on an existing function name. - An R variable is technically an object. That means it has certain characteristics depending on what kind of data it holds.
2.4.2 Data types you’re likely to use often
Storing a single number in a variable can occasionally be useful – think of doing currency conversions and storing how many dollars you can buy with a euro – but a lot of times, you’ll want to work with data that’s a bit more robust. Here are some of R’s options.
Character strings. You assign a character string to a variable by putting the data in quotation marks. Typically double quotes are used, such as mystring1 <- "Hello, world!"
, although single quotes work as well.
Vectors. It’s probably easiest to show a vector than to explain it. 2,4,6,8 is a vector of integers; “a”, “b”, “c”, “def” is a vector of character strings. A vector can only have one type of data - all integers, all strings, all logical TRUE or FALSE, and so on. If you try to mix types in one vector, R will turn all your data into a single type – 1, 2 and “three” will end up as character strings “1”, “2” and “three”. (If you want to mix data types, you need to use another type of R object called a list. Lists are useful but can get complicated. We’ll cover them later.)
You create a vector with R’s c() function, which you may want to remember as being either short for “concatenate” or “combine”. mynumbers <- c(2,4,6,8)
creates a vector of integers; mystrings <- c(“a”, “b”, “c”, “def”) creates a vector of character strings.
Remember before, when I said R objects have certain characteristics? Vectors have characteristics, or attributes, such as length
and whether each item has a name
; you can also test whether or not it’s a vector:
## [1] 4
## [1] TRUE
## NULL
Reminder: The two pound signs in front of the results of these commands indicate that they’re responses from the console and not something you should type in yourself. You won’t see the ##
when working in your own console. You will see the [1] before results of length() and is.vector().
length(), by the way, counts the number of items in a vector or R list but not the number of characters in a string. It might seem logical that length("cat")
would return 3 – the number of characters in the string “cat” – but it won’t. length("cat")
is 1, because “cat” is 1 item. length(c("cat", "dog"))
would be 2, because c("cat", "dog")
is a vector with 2 items.
If you want to count the number of characters in a string, use nchar()
.
You can do arithmetic on a vector of numbers. mynumbers / 2
will divide each item in the vector by 2:
## [1] 1 2 3 4
Data frames. This built-in data type is one of the most compelling things about working in R. A data frame is somewhat like a spreadsheet: It is 2-dimensional and has rows and columns. If you read a spreadsheet or CSV file into R, it will usually come in as a data frame.
We’ll start analyzing and visualizing data in data frames very soon.