STU33011 Lab 1
Source: https://www.scss.tcd.ie/arthur.white/Teaching/STU33011/Lab1.html Parent: https://www.scss.tcd.ie/arthur.white/Teaching/STU33011.html
Lab sessions will focus on using R to perform the
techniques that we have discussed in lectures. In this first session, we
will
- look at
R; - learn how it is used; and
- discover what options are available for finding additional help.
R is a free software environment especially designed for
statistical computing and graphics. It is open source and is regularly
updated with new packages that can perform recently developed
statistical techniques (see e.g. http://dirk.eddelbuettel.com/cranberries/).
If R has been installed on your PC then access can be
gained by selecting the Start menu, going to
All Programs, and selecting the R file from
the folder of the same name. If R has not already been
installed, then it is available for free download from http://www.r-project.org/. Click on the “CRAN” option,
choose a “mirror” (for example, Dublin, or Bristol), and then select the
appropriate version you require. Follow the usual commands
thereafter.
RStudio is an integrated development environment (IDE)
for the R programming language. It is more user-friendly
than the basic R GUI (graphical user interface) and
provides greater functionality. It can be downloaded for free at
rstudio.com.
1 Manuals
Useful on-line manuals are available at:
- Introduction to
R: http://cran.r-project.org/doc/manuals/R-intro.pdf. Rreference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf.Rmailing list help archive: http://tolstoy.newcastle.edu.au/R/about.html .- Statistics questions on stack exchange (
Ris a common tag): https://stats.stackexchange.com/ - Additional
Rpackages: http://cran.r-project.org/web/packages/ (More on this later). - For more details on
R Markdown, see https://bookdown.org/yihui/rmarkdown/ or http://rmarkdown.rstudio.com. - Cheatsheets for
RStudio,R Markdown, Importing Data, and other topics are available at https://www.rstudio.com/resources/cheatsheets/.
2 Using the R Console
Typing a command after the > prompt and pressing the
enter/return key will cause R to
evaluate the command and return the results of its computations.
In the R Console, enter:
x <- sqrt(25) + 2
This simple command tells R to calculate (\sqrt{25}+2) and to store the solution
under the name x. In RStudio, the object
x should be visible in the Environment tab of
the top right pane. The value of x can also be inspected by
entering the following:
x
## [1] 7
Exercise
- Tell
Rto save the value (e^{2.5}) under the variable namey. - Using the previously saved values of
xandy, save the value (e^{2.5}(\sqrt{25}+2)) under the variable namez(note that*denotes the multiplication symbol).
3 Help Commands
In order to access the internal help files in R, use
either the ? command or the help.search
function. Enter the following into R and observe the
result:
?exp
help.search("exponential")
The first of these commands brings up the “Logarithms and
Exponentials” help file, and explains how to use the exp
function. In general, given an R command or function
x, entering ?x will bring up its help
file.
The second command provides a list of help files in which the term
exponential can be found in the concept or title. This can
be very useful when searching for the appropriate R command
for performing a given exercise.
In RStudio, help files can also be accessed by selecting
the Help tab in the bottom right pane on screen and using
the search box.
Exercise
- Find the
Rcommand for determining the logarithm of 10 to base 10. - Find the
Rcommand for determining the logarithm of 10 to base 2.
4 Vectors and Matrices
Probably the most commonly used command in R is the
c command, which combines all the arguments it has been
given into an ordered vector. For example:
s <- c(1, 2, 3, 4)
s
## [1] 1 2 3 4
Exercise
- Create a vector
twith ordered elements 5, 6 and 7, respectively. - Using the previously saved values of
sandt, create a vectoruwith ordered elements 1, 2, 3, 4, 5, 6 and 7, respectively.
To construct a matrix, the matrix command is used. Check
this command’s help file to make sure you understand what the following
code does:
A <- matrix(c(1, 2, 2, 5), nrow = 2, ncol = 2, byrow = TRUE)
A
## [,1] [,2]
## [1,] 1 2
## [2,] 2 5
Exercise
- Create a matrix
Bsuch that: [B = \left(\begin{array}{cc} 1 & 2\ 2 & 3\ 4 & 5 \end{array}\right)]
5 Re-using Code
One quick method of re-running previously typed code is to use the (\uparrow) and (\downarrow) keys in the R console. Pressing these allows the user to cycle through previous code that is saved in the workspace. This then allows previous code to be either run again as it is, or to be quickly altered as needed (e.g. to fix a typo).
Exercise
- Use the (\uparrow) to edit the
line that was previously used to save matrix
Aso that the term in the second row and second column is a 2 instead of a 5.
An alternative (and generally better) approach is to write your code
as an R script.
To open an R script in RStudio, select
File, then New File, and
R Script. Alternatively, use the keyboard shortcut
Ctrl+Shift+N, or click the
New File icon denoted by a blank page with a green circle
containing a white plus sign, which is located below
File.
Whereas a line of code in the R Console can be run by
pressing the Enter key, to run code directly from an
R script press Ctrl+Enter or
click the Run (‘Run the current line or selection’) button
in the top right corner of the R script tab. This will run
the line on which the cursor is currently located or any code which is
currently highlighted.
R scripts can be saved by clicking the floppy disk save
icon in the top left corner of the R script tab or by using
the keyboard shortcut Ctrl+s.
Exercise
- Open and save an
Rscript that tellsRhow to create the matrixBthat you have previously defined. Run the command from the script file directly.
6 Additional Packages
The base distribution of R comes with many commonly used
add-on packages. These implement additional R commands that
are often very useful.
In RStudio, select the Packages tab of the
bottom right pane and click the box to the left of any package.
Alternatively, just use the library function, e.g.
library("cluster").
Exercise
- Load the
clusterpackage and go through the help file for itsclusplotfunction.
There are many more additional packages available from the CRAN
(comprehensive R archive network) website (http://cran.r-project.org/web/packages/). The easiest
way to install a package is to use the install.packages
function.
Another option is to follow the links on CRAN and directly download
the package of choice. In RStudio, the downloaded package
can be installed by clicking the Install icon at the top of
the Environment tab in the bottom right pane.
Exercise
- Download the
mclustpackage and install it intoR. (BeforeRallows you to load this package, it may request you download and install additional packages. These are called dependencies.) - Load the package and examine the help file for its
Mclustfunction.
7 Creating Functions
Many of the functions we have been using were created by an R
contributor, as R is open source. For example, the mclust
package and its functions were written by Chris Fraley, Adrian Raftery,
Brendan Murphy, Michael Fop, and Luca Scrucca. It is easy to write your
own functions in R. Enter the following:
square <- function(x){ x^2 }
square(3)
## [1] 9
square(s)
## [1] 1 4 9 16
square(A)
## [,1] [,2]
## [1,] 1 4
## [2,] 4 25
The first piece of code creates a function called square
that returns the squared value of its argument. Note how this function
treats the vector s and the matrix A
(specifically, square acts on the individual elements of
its argument.) In general, functions are created by assigning a name to
the command function followed by the arguments of the
function within parentheses (), followed by the commands
performed by the function within brackets {}.
8 if and for Loops
Although the speed of such calculations can be poor compared to
lower-level languages, R can be used to write for and
if loops. For example:
for(i in 1:10){
if(i==1){
print( paste("The first number is", i) )
}
if(i>1){
print( paste("The next number is", i) )
}
}
## [1] "The first number is 1"
## [1] "The next number is 2"
## [1] "The next number is 3"
## [1] "The next number is 4"
## [1] "The next number is 5"
## [1] "The next number is 6"
## [1] "The next number is 7"
## [1] "The next number is 8"
## [1] "The next number is 9"
## [1] "The next number is 10"
The syntax here is straightforward. A for loop is performed by
entering the command for followed by the index for the loop
and its range within parentheses (), followed in brackets
{} by the commands to be performed during each iteration of
the index:
for(i in 1:10){
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
The if argument is followed by a logically testable
statement given in parentheses (), followed in brackets
{} by the commands to be performed if the logical statement
is true:
if(2>0){ print("yes") }
## [1] "yes"
if(2==0){ print("yes") }
If a different command is to be performed if the initial condition is
not met, an if statement can be followed by an
else argument:
toy1 <- -2
if(toy1 > 0){
print("yes")
} else{
print("no")
}
## [1] "no"
Exercise
- Write a function that takes as its argument a vector and that
returns the sum of the elements if the vector is of length 10, returns
the square of each element if its length is less than 10, or provides
the list of differences between adjacent elements if its length is
greater than 10 (hint: the
lengthfunction will be useful here. You could also make use of thesquarefunction you have already defined, if you wanted.)
9 Importing/Exporting Data
R can import common data formats such as text files,
Excel files, SPSS files, etc.
Exercise
- Go to https://www.scss.tcd.ie/~arwhite/Teaching/STU33011.html and download the data file “music.csv”, saving it to a suitable directory.
This file consists of a comma delimited Excel sheet which can be read
into R via the following (where the dots represent the file
path): music <- read.csv("C:\\...\\music.csv").
Another approach, instead of using the full file path, is to change
the R working directory to the folder which contains the
file to be downloaded and then only use the file name itself.
In RStudio, select Session, then
Set Working Directory. If the file to be downloaded is in
the same folder as the current R script, then select
To Source File Location. Otherwise, select
Choose Directory... and browse to find the correct
folder.
Alternatively, you can use the setwd function to change
the working directory. The getwd function allows you to
check the current working directory.
Then you can enter:
music <- read.csv("music.csv")
Note that you can also import data directly using the url, e.g.
music <- read.csv("https://www.scss.tcd.ie/~arwhite/Teaching/STU33011/music.csv")
Additionally, in RStudio, you can click
Import Dataset in the Environment tab of the
top right pane or browse through the computer’s files in the
Files tab of the bottom right pane, click on the file, and
choose Import Dataset....
We have loaded a data frame named music into
R. In RStudio, this dataframe should be
visible in the Environment tab in the top right pane. Once
the file has been read in, the first 10 rows of music can be checked by
entering:
head(music, 10)
## X Artist Type LVar LAve LMax LFEner LFreq
## 1 Dancing Queen Abba Rock 17600756 -90.00687 29921 105.9210 59.57379
## 2 Knowing Me Abba Rock 9543021 -75.76672 27626 102.8362 58.48031
## 3 Take a Chance Abba Rock 9049482 -98.06292 26372 102.3249 124.59397
## 4 Mamma Mia Abba Rock 7557437 -90.47106 28898 101.6165 48.76513
## 5 Lay All You Abba Rock 6282286 -88.95263 27940 100.3008 74.02039
## 6 Super Trouper Abba Rock 4665867 -69.02084 25531 100.2485 81.40140
## 7 I Have A Dream Abba Rock 3369670 -71.68288 14699 104.5969 305.18689
## 8 The Winner Abba Rock 1135862 -67.81905 8928 104.3492 277.66056
## 9 Money Abba Rock 6146943 -76.28075 22962 102.2407 165.15799
## 10 SOS Abba Rock 3482882 -74.13000 15517 104.3624 146.73700
music consists of 8 columns and 62 observations. To
determine the dimensions of music run the command:
dim(music)
## [1] 62 8
To save the music data frame, i.e. to write the data
frame to a new file, the following could be used:
write.table(music, file = "C:\\...\\music2.csv", sep = ",")
Or if the working directory has been changed then you can simply enter:
write.table(music, file = "music2.csv", sep = ",")
In the above, the music argument provides the name of
the data frame to be saved, the second (file) argument
specifies the file path to which you wish the data to be saved, and the
final (sep) argument tells R to save the file
in a comma delimited format.
Exercise
- Save an
Rscript that tellsRhow to import thematrixdataset. Check your script by running the command from the script file directly.
10 R Markdown
This document was created using an R Markdown file.
R Markdown files allow us to create text documents which
include blocks of R code, as well as the output and plots
that they produce. We can also use LaTex to write mathematical formulas
in R Markdown files. There are a lot of online resources
for R Markdown. See here for example: https://shiny.rstudio.com/articles/rm-cheatsheet.html
To open an R Markdown document, click on the
New File icon or navigate to New File via
File and then select R Markdown.... Click on
the Knit icon above the R Markdown file text
to produce the output document.