R
and RStudio;R
as a calculator;R
;R
packages and functions;R
and RStudioR
?R
is a free an open-source programming language and environment.R
?R
is free, open source: built for you and for everyone;R
is popular: a large enganged user-base fosters the continued development and the maintainance of statistical tools;R
is powerfulR
supports extensionsR
runs on most operating systemsR
connects with other languages: C++
, Java
, Python
, Julia
, Stan
and more!R
?An example of a workflow to analyze data without R
.
R
?An example of a workflow to analyze data without R
.
R
allows you to do a lot without needing to use other programs.
R
?All of these graphs were made in R
!
RStudio is the most used Integrated Development Environment (IDE) for R
.
It includes a console, a syntax-highlighting editor that supports direct code execution with tools for plotting, history, debugging and workspace management.
It integrates with R
(and other programming languages) to provide a lot of useful features:
RStudio supports authoring HTML, PDF, Word and presentation documents
RStudio supports version control with Git (directly to Github) and Subversion
RStudio is the most used Integrated Development Environment (IDE) for R
.
It includes a console, a syntax-highlighting editor that supports direct code execution with tools for plotting, history, debugging and workspace management.
It integrates with R
(and other programming languages) to provide a lot of useful features:
RStudio make it easy to start new or find existing projects
RStudio supports interactive graphics with Shiny and ggvis
RStudio is the most used Integrated Development Environment (IDE) for R
.
It includes a console, a syntax-highlighting editor that supports direct code execution with tools for plotting, history, debugging and workspace management.
It integrates with R
(and other programming languages) to provide a lot of useful features:
RStudio make it easy to start new or find existing projects
RStudio supports interactive graphics with Shiny and ggvis
There are other IDE for R
: Atom, Visual Studio, Jupyter notebook and Jupyter lab!
Open RStudio.
Let us do this one together!
Here, the workshop instructor may minimize the presentation and show his RStudio window and point out all View panes being displayed, briefly explaining each one of them.
Be sure to adjust the font size and scale of the window according to the presentation being given (remote presentations allow for smaller font size).
If the restriction unable to write on disk
appears when you attempt to open RStudio or to install a package.
Do not worry!
We have the solution!
When you open RStudio for the first time, the screen will be divided across three main Pane groups:
Once you Open a Script or Create a New Script (File > New File > R Script or Ctrl/Cmd + Shit + N
), the fourth panel will appear!
Most of the action happens here!
Usually, the first text you see within the Console pane is the R
version RStudio is using.
The Console is the place where R
is waiting for you to tell it what to do, and is where it will communicate with you, showing the outcome of your command.
Whenever R
is ready to accept commands, it will show a >
prompt.
Text in the console typically looks like this:
output# [1] "This is the output"
Remember that one must write the command in front of the >
prompt and then press "Return" it to run.
What does the square brackets [ ]
within the output mean?
Text in the console typically looks like this:
output# [1] "This is the output"
Remember that one must write the command in front of the >
prompt and then press "Return" it to run.
What does the square brackets [ ]
within the output mean?
The numbers within the brackets help you to locate the position of elements within the output.
seq(1, 100, by = 2)# [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49# [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99
Show the participants how the number between the square brackets indicates position of the elements.
Often, the Console will output Errors and Warning messages.
Warning message
x <- c("2", -3, "end", 0, 4, 0.2)as.numeric(x)# Warning: NAs introduced by coercion# [1] 2.0 -3.0 NA 0.0 4.0 0.2
Error message
x*10# Error in x * 10: non-numeric argument to binary operator
Google is your best friend in solving Errors or Warnings!
R
as a calculator 1 + 1# [1] 210 - 1# [1] 9
1 + 1# [1] 210 - 1# [1] 9
2 * 2# [1] 48 / 2# [1] 4
1 + 1# [1] 210 - 1# [1] 9
2 * 2# [1] 48 / 2# [1] 4
2^3# [1] 8
Use R
to calculate the following equation:
2+16∗24−56
Hint: The *
symbol is used to multiply.
Use R
to calculate the following equation:
2+16∗24−56
It would look like this in R
:
2 + 16 * 24 - 56# [1] 330
Use R
to calculate the following equation:
2+16∗24−56/(2+1)−457
Hint: Think about the order of the operation.
Use R to calculate the following equation:
2+16∗24−56/(2+1)−457
It would look like this in R
:
2 + 16 * 24 - 56 / (2 + 1) - 457# [1] -89.66667
Note that R
respects the order of the operations
R
for arithmetic operationsWhat is the area of a circle with a radius of 5 cm?
Areacircle=π×r2
R
for arithmetic operationsWhat is the area of a circle with a radius of 5 cm?
Areacircle=π×r2
3.1416 * 5^2# [1] 78.54
R
for arithmetic operationsWhat is the area of a circle with a radius of 5 cm?
Areacircle=π×r2
3.1416 * 5^2# [1] 78.54
But... R
has built-in constants!
You can find them by typing ?
and Constants
(as in ?Constants
) and executing it! What is the one for π?
We can then write and execute this:
pi * 5^2# [1] 78.53982
You have just ran a command preceeded by ?
. What happened?
We can use the ↑ and ↓ arrow keys to retrieve commands previously run.
Make sure your cursor is blinking in front of the >
prompt and give it a try!
R
R
: an object-oriented environmentYou can assign information to named objects using the assignment operator <-
.
The information is assigned to the name that is pointed by the assignment operator <-
.
See the examples below:
money_talks <- "ACDC"money_talks# [1] "ACDC"
9 -> my_birthday_monthmy_birthday_month# [1] 9
Careful! There is no space between the less than (<
) and minus (-
) signs.
R
: an object-oriented environmentYou can assign information to named objects using the assignment operator <-
.
The information is assigned to the name that is pointed by the assignment operator <-
.
See the examples below:
money_talks <- "ACDC"money_talks# [1] "ACDC"
9 -> my_birthday_monthmy_birthday_month# [1] 9
Careful! There is no space between the less than (<
) and minus (-
) signs.
One can assign values using =
instead of the <-
operator. We caution against using =
to assign values to objects because the =
is allowed at the top level only and also determines subexpressions.
Objects names can only include:
Type | Symbol |
---|---|
Letters | a-z A-Z |
Numbers | 0-9 |
Period | . |
Underscore | _ |
R
is case-sensitive: Data_1
is different than data_1
.@
, /
, #
, etc.).data_1 <- 1
will overwrite any previously objects named data_1
Short and explicit names are preferred.
var
is not very informative.You can separate words within a name using underscores ( _
) or dots ( .
).
avg_richness
or avg.richness
are easier to read than avgrichness
.Avoid using names of existing functions or constants (e.g., c
, table
, T
, matrix
)
Short and explicit names are preferred.
var
is not very informative.You can separate words within a name using underscores ( _
) or dots ( .
).
avg_richness
or avg.richness
are easier to read than avgrichness
.Avoid using names of existing functions or constants (e.g., c
, table
, T
, matrix
)
Add spaces around operators (=
, +
, -
, <-
, etc.) to make the code more readable.
Always put a space after a comma, and never before (like in regular English).
Preferred
mean_x <- (2 + 6) / 2mean_x# [1] 4
Not preferred
meanx<-(2+6)/2meanx# [1] 4
Create an object with a name (of your choice) that starts with a number. What happens?
Create an object with a name (of your choice) that starts with a number. What happens?
Creating an object name that starts with a number returns the following error:
Error: unexpected symbol in "your object name"
Create an object with a value of 1 + 1.718282
(e or Euler's number) and name it euler_value
.
Create an object with a value of 1 + 1.718282
(e or Euler's number) and name it euler_value
.
euler_value <- 1 + 1.718282euler_value# [1] 2.718282
Create an object with a value of 1 + 1.718282
(e or Euler's number) and name it euler_value
.
euler_value <- 1 + 1.718282euler_value# [1] 2.718282
What has happened in your RStudio window when you created this object?
The presenter should not here that the object will appear in the RStudio Environment.
The Environment panel shows you all the objects you have defined in your current workspace.
You can use the Tab
key to auto-complete commands.
This helps preventing spelling errors
Let us try it!
Try writing eul
or abb
in front of the >
prompt and press Tab
.
If more than one element appears, you can use the arrow keys ( ↑ ↓ ) and press "Return" or use your mouse to select the correct one.
R
R
Data types define how the values are stored in R
.
We can obtain the type and mode of an object using the functions typeof()
. The core data types are:
Numeric-type with integer and double values
(x <- 1.1)# [1] 1.1typeof(x)# [1] "double"
(y <- 2L)# [1] 2typeof(y)# [1] "integer"
R
Data types define how the values are stored in R
.
We can obtain the type and mode of an object using the functions typeof()
. The core data types are:
Numeric-type with integer and double values
(x <- 1.1)# [1] 1.1typeof(x)# [1] "double"
(y <- 2L)# [1] 2typeof(y)# [1] "integer"
Character-type (always between " "
)
z <- "You are becoming very good in this!"typeof(z)# [1] "character"
R
Data types define how the values are stored in R
.
We can obtain the type and mode of an object using the functions typeof()
. The core data types are:
Numeric-type with integer and double values
(x <- 1.1)# [1] 1.1typeof(x)# [1] "double"
(y <- 2L)# [1] 2typeof(y)# [1] "integer"
Character-type (always between " "
)
z <- "You are becoming very good in this!"typeof(z)# [1] "character"
Logical-type
t <- TRUEtypeof(t)# [1] "logical"
f <- FALSEtypeof(f)# [1] "logical"
The presenter might be asked about running the entire line within ()
. This is the same as x <-2; x
. We have done that so it fits.
R
: scalarsUntil this moment, we have create objects that had just one element inside them:
x <- 1.1x# [1] 1.1
euler_value <- 1 + 1.718282euler_value# [1] 2.718282
An object that has just a single value or unit like a number or a text string is called a scalar.
a <- 100b <- 3 / 100c <- (a + b) / b
d <- "species"e <- "genus"f <- "When is the next pause again?"
R
: scalarsUntil this moment, we have create objects that had just one element inside them:
x <- 1.1x# [1] 1.1
euler_value <- 1 + 1.718282euler_value# [1] 2.718282
An object that has just a single value or unit like a number or a text string is called a scalar.
a <- 100b <- 3 / 100c <- (a + b) / b
d <- "species"e <- "genus"f <- "When is the next pause again?"
By creating combinations of scalars, we can create data with different structures in R
. We are getting there!
R
: vectorsA vector object is just a combination of several scalars stored as a single object.
Like scalars, vectors can be of numeric
-, logical
-, character
-types, but never a mix of them!
R
: vectorsA vector object is just a combination of several scalars stored as a single object.
Like scalars, vectors can be of numeric
-, logical
-, character
-types, but never a mix of them!
There are many ways to create vectors in R
. Here are some we are going to see:
Function | Example | Result |
---|---|---|
c(a, b, ...) |
c(1, 3, 5, 7, 9) |
1, 3, 5, 7, 9 |
a:b |
1:5 |
1, 2, 3, 4, 5 |
seq(from, to, by, length.out) |
seq(from = 0, to = 6, by = 2) |
0, 2, 4, 6 |
rep(x, times, each, length.out) |
rep(c(7, 8), times = 2, each = 2) |
7, 7, 8, 8, 7, 7, 8, 8 |
c()
The c()
function (c
stands for concatenate, meaning bring them together) combines several scalars as arguments, which are separated by commas, and returns a vector containing them:
vector <- c(value1, value2, ...)
Let us use the c()
function to create vectors of different types:
Numeric vector
num_vector <- c(1, 4, 32, -76, -4)num_vector# [1] 1 4 32 -76 -4
Character vector
char_vector <- c("blue", "red", "green")char_vector# [1] "blue" "red" "green"
Logical vector
bool_vector <- c(TRUE, TRUE, FALSE) # or c(T, T, F)bool_vector# [1] TRUE TRUE FALSE
a:b
, seq()
, rep()
The a:b
takes two numeric scalars a
and b
as arguments, and returns a vector of numbers from the starting point a
to the ending point b
, in steps of 1
unit:
1:8# [1] 1 2 3 4 5 6 7 8
7.5:1.5# [1] 7.5 6.5 5.5 4.5 3.5 2.5 1.5
a:b
, seq()
, rep()
The a:b
takes two numeric scalars a
and b
as arguments, and returns a vector of numbers from the starting point a
to the ending point b
, in steps of 1
unit:
1:8# [1] 1 2 3 4 5 6 7 8
7.5:1.5# [1] 7.5 6.5 5.5 4.5 3.5 2.5 1.5
seq()
allows us to create a sequence, like a:b
, but also allows us to specify either the size of the steps (the by
argument), or the total length of the sequence (the length.out
argument):
seq(from = 1, to = 10, by = 2)# [1] 1 3 5 7 9
seq(from = 20, to = 2, by = -2)# [1] 20 18 16 14 12 10 8 6 4 2
a:b
, seq()
, rep()
The a:b
takes two numeric scalars a
and b
as arguments, and returns a vector of numbers from the starting point a
to the ending point b
, in steps of 1
unit:
1:8# [1] 1 2 3 4 5 6 7 8
7.5:1.5# [1] 7.5 6.5 5.5 4.5 3.5 2.5 1.5
seq()
allows us to create a sequence, like a:b
, but also allows us to specify either the size of the steps (the by
argument), or the total length of the sequence (the length.out
argument):
seq(from = 1, to = 10, by = 2)# [1] 1 3 5 7 9
seq(from = 20, to = 2, by = -2)# [1] 20 18 16 14 12 10 8 6 4 2
rep()
allows you to repeat a scalar (or vector) a specified number of times, or to a desired length:
rep(x = 1:3, each = 2, times = 2)# [1] 1 1 2 2 3 3 1 1 2 2 3 3
rep(x = c(1, 2), each = 3)# [1] 1 1 1 2 2 2
The presenter should point out the differences between the rep example 1 and example 2, where a vector can be used inside the argument.
Let's practice:
odd_n
Let's practice:
odd_n
Solution:
odd_n <- c(1, 3, 5, 7, 9)
or
odd_n <- seq(from = 1, to = 9, by = 2)odd_n# [1] 1 3 5 7 9
Let us begin with the following objects:
x <- c(1:5)y <- 6
Remember that the colon symbol :
combines all values between the first and the second number in steps of 1
. c(1:5)
or 1:5
is equivalent to c(1, 2, 3, 4, 5)
Let us begin with the following objects:
x <- c(1:5)y <- 6
Remember that the colon symbol :
combines all values between the first and the second number in steps of 1
. c(1:5)
or 1:5
is equivalent to c(1, 2, 3, 4, 5)
What happens when we add and multiply the two objects together?
x + y# [1] 7 8 9 10 11
x * y# [1] 6 12 18 24 30
Let us begin with the following objects:
x <- c(1:5)y <- 6
Remember that the colon symbol :
combines all values between the first and the second number in steps of 1
. c(1:5)
or 1:5
is equivalent to c(1, 2, 3, 4, 5)
What happens when we add and multiply the two objects together?
x + y# [1] 7 8 9 10 11
x * y# [1] 6 12 18 24 30
Excellent! We have learned a lot! How about a short break?
Can you guess what our next topic is?
R
: matricesWe have learned that scalars contain one element, and that vectors contain more than one scalar of the same type!
Matrices are nothing but a bunch of vectors stacked together!
While vectors have one dimension, matrices have two dimensions, determined by rows and columns.
Finally, like vectors and scalars matrices can contain only one type of data: numeric
, character
, or logical
.
matrix()
, cbind()
, rbind()
There are many ways to create your own matrix. Let us start with a simple one:
matrix(data = 1:10, nrow = 5, ncol = 2)# [,1] [,2]# [1,] 1 6# [2,] 2 7# [3,] 3 8# [4,] 4 9# [5,] 5 10
matrix(data = 1:10, nrow = 2, ncol = 5)# [,1] [,2] [,3] [,4] [,5]# [1,] 1 3 5 7 9# [2,] 2 4 6 8 10
matrix()
, cbind()
, rbind()
There are many ways to create your own matrix. Let us start with a simple one:
matrix(data = 1:10, nrow = 5, ncol = 2)# [,1] [,2]# [1,] 1 6# [2,] 2 7# [3,] 3 8# [4,] 4 9# [5,] 5 10
matrix(data = 1:10, nrow = 2, ncol = 5)# [,1] [,2] [,3] [,4] [,5]# [1,] 1 3 5 7 9# [2,] 2 4 6 8 10
We can also combine multiple vectors using cbind()
and rbind()
:
nickname <- c("kat", "gab", "lo")animal <- c("dog", "mouse", "cat")
rbind(nickname, animal)# [,1] [,2] [,3] # nickname "kat" "gab" "lo" # animal "dog" "mouse" "cat"
cbind(nickname, animal)# nickname animal # [1,] "kat" "dog" # [2,] "gab" "mouse"# [3,] "lo" "cat"
Similarly as in the case of vectors, operations with matrices work just fine:
(mat_1 <- matrix(data = 1:9, nrow = 3, ncol = 3))# [,1] [,2] [,3]# [1,] 1 4 7# [2,] 2 5 8# [3,] 3 6 9
(mat_2 <- matrix(data = 9:1, nrow = 3, ncol = 3))# [,1] [,2] [,3]# [1,] 9 6 3# [2,] 8 5 2# [3,] 7 4 1
The product of the matrices is:
mat_1 * mat_2# [,1] [,2] [,3]# [1,] 9 24 21# [2,] 16 25 16# [3,] 21 24 9
It is your time to get your hands dirty!
Remember that text strings must always be surrounded by quote marks (" "
).
Remember that values or arguments must be separated by commas if they are inside a function, e.g. c("one", "two", "three")
.
It is your time to get your hands dirty!
(step_1 <- matrix(data = 1:6, nrow = 2, ncol = 3))# [,1] [,2] [,3]# [1,] 1 3 5# [2,] 2 4 6
(step_2 <- matrix( data = c("cheetah", "tiger", "ladybug", "deer", "monkey", "crocodile"), nrow = 2, ncol = 3))# [,1] [,2] [,3] # [1,] "cheetah" "ladybug" "monkey" # [2,] "tiger" "deer" "crocodile"
It is your time to get your hands dirty!
step_1# [,1] [,2] [,3]# [1,] 1 3 5# [2,] 2 4 6
step_2# [,1] [,2] [,3] # [1,] "cheetah" "ladybug" "monkey" # [2,] "tiger" "deer" "crocodile"
step_3 <- cbind(c(2:5), c("linley", "jessica", "joe", "emma"))
step_3# [,1] [,2] # [1,] "2" "linley" # [2,] "3" "jessica"# [3,] "4" "joe" # [4,] "5" "emma"
The presenter here should mention that matrices being formed by vectors or scalars with multiple data types become characters. This will be a window to introduce data frames to the participants.
R
: data framesDifferently than a matrix, a data frame can contain numeric
, character
, and logical
columns (or vectors).
R
: data framesData frames resemble a lot the usual Excel tables that we use in our research!
site_id | soil_pH | num_sp | fertilised |
---|---|---|---|
A1.01 | 5.6 | 17 | yes |
A1.02 | 7.3 | 23 | yes |
B1.01 | 4.1 | 15 | no |
B1.02 | 6.0 | 7 | no |
site_id
identifies the sampling site, soil_pH
is the soil pH,num_sp
is the number of species, andfertilised
identifies the treatment applied.One of the ways of representing this table in R, is to create vectors:
site_id <- c("A1.01", "A1.02", "B1.01", "B1.02")soil_pH <- c(5.6, 7.3, 4.1, 6.0)num_sp <- c(17, 23, 15, 7)fertilised <- c("yes", "yes", "no", "no")
R
: data framesData frames resemble a lot the usual Excel tables that we use in our research!
site_id | soil_pH | num_sp | fertilised |
---|---|---|---|
A1.01 | 5.6 | 17 | yes |
A1.02 | 7.3 | 23 | yes |
B1.01 | 4.1 | 15 | no |
B1.02 | 6.0 | 7 | no |
site_id
identifies the sampling site, soil_pH
is the soil pH,num_sp
is the number of species, andfertilised
identifies the treatment applied.One of the ways of representing this table in R, is to create vectors:
site_id <- c("A1.01", "A1.02", "B1.01", "B1.02")soil_pH <- c(5.6, 7.3, 4.1, 6.0)num_sp <- c(17, 23, 15, 7)fertilised <- c("yes", "yes", "no", "no")
We then combine them using data.frame()
:
soil_fertilisation_data <- data.frame(site_id, soil_pH, num_sp, fertilised)
R
: data framesData frames resemble a lot the usual Excel tables that we use in our research!
site_id | soil_pH | num_sp | fertilised |
---|---|---|---|
A1.01 | 5.6 | 17 | yes |
A1.02 | 7.3 | 23 | yes |
B1.01 | 4.1 | 15 | no |
B1.02 | 6.0 | 7 | no |
site_id
identifies the sampling site, soil_pH
is the soil pH,num_sp
is the number of species, andfertilised
identifies the treatment applied.
soil_fertilisation_data
looks like this!
soil_fertilisation_data# site_id soil_pH num_sp fertilised# 1 A1.01 5.6 17 yes# 2 A1.02 7.3 23 yes# 3 B1.01 4.1 15 no# 4 B1.02 6.0 7 no
Note how the data frame integrated the name of the objects as column names!
R
R
We have our information stored as a vector
, a matrix
, or a data.frame
in R
.
We will probably be interested in accessing and even subsetting the data based on some criteria.
Let's start with a pretty basic one: the square brackets [ ]
and [, ]
We can indicate the position of the values we want to see between the brackets. This is often called indexing or slicing.
Let us see how to index a vector object in R
.
To index an element within a vector using []
, we need to write position number of the element within the brackets []
.
For instance, here is our odd_n
object:
(odd_n <- seq(1,9, by = 2))# [1] 1 3 5 7 9
To index an element within a vector using []
, we need to write position number of the element within the brackets []
.
For instance, here is our odd_n
object:
(odd_n <- seq(1,9, by = 2))# [1] 1 3 5 7 9
To obtain the value in the second position, we do as follows:
odd_n[2]# [1] 3
To index an element within a vector using []
, we need to write position number of the element within the brackets []
.
For instance, here is our odd_n
object:
(odd_n <- seq(1,9, by = 2))# [1] 1 3 5 7 9
To obtain the value in the second position, we do as follows:
odd_n[2]# [1] 3
We can also obtain values for multiple positions within a vector with c()
:
odd_n[c(2, 4)]# [1] 3 7
To index an element within a vector using []
, we need to write position number of the element within the brackets []
.
For instance, here is our odd_n
object:
(odd_n <- seq(1,9, by = 2))# [1] 1 3 5 7 9
To obtain the value in the second position, we do as follows:
odd_n[2]# [1] 3
We can also obtain values for multiple positions within a vector with c()
:
odd_n[c(2, 4)]# [1] 3 7
And, we can remove values pertaining to particular positions from a vector using the minus (-
) sign before the position value:
odd_n[-c(1, 2)]# [1] 5 7 9
odd_n[-4]# [1] 1 3 5 9
Using the vector num_vector
and our indexing abilities:
num_vector <- c(1, 4, 3, 98, 32, -76, -4)
Using the vector num_vector
and our indexing abilities:
num_vector[4]# [1] 98
num_vector[c(1, 3)]# [1] 1 3
num_vector[c(-2, -4)]# [1] 1 3 32 -76 -4
num_vector[6:10]# [1] -76 -4 NA NA NA
What happened there? What is that NA?
To index a data frame you must specify the position of values within two dimensions: within the row and columns, as in:
data_frame_name[row_number, column_number]
In this way, to extract the first row:
my_df[1, ]
To extract the third column:
my_df[, 3]
And, to extract the element within the second row and the fourth column:
my_df[2, 4]
Battleship (game) feelings!
Remember that our soil_fertilisation_data
data frame had column names?
soil_fertilisation_data# site_id soil_pH num_sp fertilised# 1 A1.01 5.6 17 yes# 2 A1.02 7.3 23 yes# 3 B1.01 4.1 15 no# 4 B1.02 6.0 7 no
We can subset columns from it using the column names:
soil_fertilisation_data[ , c("site_id", "soil_pH")]# site_id soil_pH# 1 A1.01 5.6# 2 A1.02 7.3# 3 B1.01 4.1# 4 B1.02 6.0
And, also subset columns from it using $
:
soil_fertilisation_data$site_id# [1] "A1.01" "A1.02" "B1.01" "B1.02"
Note: $
only works for data frames!
What if I want to subset columns based on a condition?
What if I want to subset columns based on a condition?
We can do that! But first, you need to learn about logical operators and logical testing.
You might remember about the logical
data types, which contains only TRUE and FALSE values.
We can use operators to obtain TRUE or FALSE values for a type of testing. See examples below:
Operator | Description | Example | Result |
---|---|---|---|
< and > |
less than or greater than | odd_n > 3 |
FALSE, FALSE, TRUE, TRUE, TRUE |
<= and >= |
less/greater or equal to | odd_n >= 3 |
FALSE, TRUE, TRUE, TRUE, TRUE |
== |
exactly equal to | odd_n == 3 |
FALSE, TRUE, FALSE, FALSE, FALSE |
!= |
not equal to | odd_n != 3 |
TRUE, FALSE, TRUE, TRUE, TRUE |
x | y |
x OR y | odd_n[odd_n >= 5 | odd_n < 3] |
1, 5, 7, 9 |
x & y |
x AND y | odd_n[odd_n >=3 & odd_n < 7] |
3, 5 |
x %in% y |
x match y | odd_n[odd_n %in% c(3,7)] |
3, 7 |
We can use conditions to select values:
odd_n[odd_n > 4]# [1] 5 7 9
We can use conditions to select values:
odd_n[odd_n > 4]# [1] 5 7 9
It is also possible to match a character string.
char_vector <- c("blue", "red", "green")
char_vector[char_vector == "blue"]# [1] "blue"
There are also ways in R
that allows us to test conditions!
We can for, instance, test if values within a vector or a matrix are numeric
:
char_vector# [1] "blue" "red" "green"is.numeric(char_vector)# [1] FALSE
odd_n# [1] 1 3 5 7 9is.numeric(odd_n)# [1] TRUE
There are also ways in R
that allows us to test conditions!
We can for, instance, test if values within a vector or a matrix are numeric
:
char_vector# [1] "blue" "red" "green"is.numeric(char_vector)# [1] FALSE
odd_n# [1] 1 3 5 7 9is.numeric(odd_n)# [1] TRUE
Or, whether they are of the character
type:
char_vector# [1] "blue" "red" "green"is.character(char_vector)# [1] TRUE
odd_n# [1] 1 3 5 7 9is.character(odd_n)# [1] FALSE
There are also ways in R
that allows us to test conditions!
We can for, instance, test if values within a vector or a matrix are numeric
:
char_vector# [1] "blue" "red" "green"is.numeric(char_vector)# [1] FALSE
odd_n# [1] 1 3 5 7 9is.numeric(odd_n)# [1] TRUE
Or, whether they are of the character
type:
char_vector# [1] "blue" "red" "green"is.character(char_vector)# [1] TRUE
odd_n# [1] 1 3 5 7 9is.character(odd_n)# [1] FALSE
And, also, if they are vectors:
char_vector# [1] "blue" "red" "green"is.vector(char_vector)# [1] TRUE
Explore the difference between these two lines of code:
char_vector == "blue"char_vector[char_vector == "blue"]
Explore the difference between these 2 lines of code:
char_vector == "blue"# [1] TRUE FALSE FALSE
In this line of code, you test a logical statement. For each entry in the char_vector
, R
checks whether the entry is equal to blue
or not.
char_vector[char_vector == "blue"]# [1] "blue"
In this above line, we asked R
to extract all values within the char_vector
vector that are exactly equal to blue
.
Extract the num_sp
column from soil_fertilisation_data
and multiply its value by the first four values of num_vec
.
After that, write a statement that checks if the values you obtained are greater than 25.
num_sp
column from soil_fertilisation_data
and multiply its value by the first four values of num_vec
.soil_fertilisation_data$num_sp * num_vector[c(1:4)]# [1] 17 92 45 686
or
soil_fertilisation_data[, 3] * num_vector[c(1:4)]# [1] 17 92 45 686
(soil_fertilisation_data$num_sp * num_vector[c(1:4)]) > 25# [1] FALSE TRUE TRUE TRUE
We focused here mainly on vectors
and data frames
.
While we will discuss discuss about arrays and lists in other workshops, you can already have an idea what these types of object structure are.
Any wild guesses?
R
A function is a tool that simplifies our lives!
It allows you to quickly execute operations on objects without having to write every mathematical step.
A function needs entry values called arguments (or parameters).
It then performs (hidden) actions using these arguments and returns an output.
Today, we will look only into R
's built-in functions, but you will learn how to make your own functions during Workshop #5!
To use (or to call) a function, the command must be structured properly, following the "grammar rules" of the R
language: the syntax.
function_name(argument1 = value, argument2 = value, ..., argument4 = value)
Arguments are values and instructions the function needs to run.
Objects storing these values and instructions can be used in functions:
a <- 3b <- 5sum(a, b)# [1] 8
mean(soil_fertilisation_data$num_sp)# [1] 15.5
a
that contains all the numbers from 1 to 5.b
that has a value of 2.a
and b
together using the basic +
operator and save the result in an object called result_add
.a
and b
together using the sum
function and save the result in an object called result_sum
.result_add
and result_sum
different?5
to result_sum
using the sum()
function.result_add
and result_sum
different?a <- c(1:5)b <- 2result_add <- a + bresult_sum <- sum(a, b)
5
to result_sum
using the sum()
function.result_add# [1] 3 4 5 6 7result_sum# [1] 17sum(result_sum, 5)# [1] 22
result_add
and result_sum
different?a <- c(1:5)b <- 2result_add <- a + bresult_sum <- sum(a, b)
5
to result_sum
using the sum()
function.result_add# [1] 3 4 5 6 7result_sum# [1] 17sum(result_sum, 5)# [1] 22
The operation +
on the vector a
adds 2 to each element. The result is a vector.
The function sum()
concatenates all the values provided and then sum them. It is the same as doing 1 + 2 + 3 + 4 + 5 + 2
.
Each argument has a name, which may be used during a function call.
For instance, the first arguments of the matrix()
function are:
matrix(data, nrow, ncol)
We can create a matrix using:
matrix(data = 1:12, nrow = 3, ncol = 4)# [,1] [,2] [,3] [,4]# [1,] 1 4 7 10# [2,] 2 5 8 11# [3,] 3 6 9 12
We can also execute a function when omitting argument names, however the order of the values will matter:
matrix(data = 1:12, 3, 4)# [,1] [,2] [,3] [,4]# [1,] 1 4 7 10# [2,] 2 5 8 11# [3,] 3 6 9 12
matrix(1:12, 4, 3)# [,1] [,2] [,3]# [1,] 1 5 9# [2,] 2 6 10# [3,] 3 7 11# [4,] 4 8 12
plot
is a function that draws a graph of y
as a function of x
. It requires two arguments names x
and y
. What are the differences between the following lines?
a <- 1:100b <- a^2plot(a, b, type = "l")plot(b, a, type = "l")plot(x = a, y = b, type = "l")plot(y = b, x = a, type = "l")
The argument type
of the function plot
let you choose the type of graph you want. Try it without this argument.
plot
is a function that draws a graph of y as a function of x. It requires two arguments names x
and y
. What are the differences between the following lines?
plot(a, b, type = "l")
plot(b, a, type = "l")
The shape of the plot changes, as we did not provide the argument's names, the order is important.
plot(x = a, y = b, type = "l")
plot(y = b, x = a, type = "l")
Same as plot(a, b, type = "l")
. The argument names are provided, the order is not important.
R
PackagesPackages group functions and/or datasets that share a similar theme, e.g. statistics, spatial analysis, plotting.
Anyone can develop packages and make them available to others.
Many packages available through the Comprehensive R Archive Network (CRAN) and now many more on GitHub.
Guess how many packages are available (not only within the CRAN)?
R
packagesTo install packages in your Computer, use the function install.packages()
.
install.packages("package_name")
Installing a package is essential to use it, but there is one more step: loading it.
You can load a package into your workspace using the library()
function.
library(package_name)
R
package: ggplot2
Let us install a popular visualization package called ggplot
.
install.packages("ggplot2")
Installing package into '/home/labo/R/x86_64-redhat-linux-gnu-library/3.3'(as 'lib' is unspecified)
Now we will use the function qplot
from the package
qplot(1:10, 1:10)
R
package: ggplot2
Let us install a popular visualization package called ggplot
.
install.packages("ggplot2")
Installing package into '/home/labo/R/x86_64-redhat-linux-gnu-library/3.3'(as 'lib' is unspecified)
Now we will use the function qplot
from the package
qplot(1:10, 1:10)
Did you get this error?
## Error: could not find function "qplot"
R
package: ggplot2
We need to gain access the functions that are available within the installed package. To do this, load ggplot2
using the library()
function.
library(ggplot2)
Now we can draw the graph
qplot(1:10, 1:10)
The ggplot2
package will be covered in Workshop #3: Introduction to ggplot2.
WOW! R
is so great! So many functions to do what I want!
But... how do I find them?
WOW! R
is so great! So many functions to do what I want!
But... how do I find them?
To find a function that does something specific in your installed packages, you can use ??
followed by a search term.
Let us say we want to create a sequence of odd numbers between 0 and 10 as we did earlier. We can search in our packages all the functions with the word "sequence" in them:
??sequence
OK! So let us use the seq
function!
But wait... how does it work? What arguments does it need?
To find information about a function in particular, use ?
?seq
function_name {package_name}
Description
: a short description of what the function does.name = value
is present, a default value is provided if the argument is missing. The argument becomes optional.Create a sequence of even numbers from 0 to 10 using the seq
function.
Create an unsorted vector of your favourite numbers, then sort your vector in reverse order.
seq
function.seq(from = 0, to = 10, by = 2)# [1] 0 2 4 6 8 10
seq(0, 10, 2)# [1] 0 2 4 6 8 10
numbers <- c(2, 4, 22, 6, 26)sort(numbers, decreasing = TRUE)# [1] 26 22 6 4 2
Usually, your best source of information will be your favorite search engine!
Here are some tips on how to use them efficiently:
R
at the beginning of your search;Find the appropriate functions to perform the following operations:
Find the appropriate functions to perform the following operations:
sqrt()
mean()
cbind()
ls()
Lots of cheat sheets are available online.
Open it directly from RStudio: Help → Cheatsheets
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |