shoot for 1hr
we’re going to teach you some of the fundamentals of the R language as well as some best practices for organizing code for scientific projects that will make your life easier
Tools>Keyboard shortcut helpAlt+- inserts <- at cursor Control+Shift+m inserts %>% at cursor (we’ll get into thisfor day 2) Control+Enter = Run current line/selection ctrl+1 = Move cursor to source ctrl+2 = Move cursor to console
Much of your time in R will be spent in the R interactive console.
This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file.
* much time spent in console working out code 
* console `>` with blinking cursor, much like command line
* "read, evaluate, print, loop" REPL - many languages adopt this paradigm (bash, stata, python)
* R tries to execute them, and then returns a result1 + 100## [1] 101+ instead of ~>`, waiting for you to comlete command, ESC or control+c will escapeUsing R as a calculator: R uses same order of opterations to lowest precedence:
( )^ or **/*3+5*2 v. (3+5) * 2
3+5*2## [1] 13vs. Use parentheses to group operations in order to force the order of evaluation if it differs from the default, or to make clear what you intend.
(3+5) * 2## [1] 16(3 + (5 * (2 ^ 2))) # hard to read## [1] 233 + 5 * 2 ^ 2       # clear, if you remember the rules## [1] 233 + 5 * (2 ^ 2)     # if you forget some rules, this might help## [1] 232/10000 #2e-04 shorthand for 10^XX2/10000## [1] 2e-04So 2e-4 is shorthand for 2 * 10^(-4).
You can write numbers in scientific notation too:
5e3 # Note the lack of minus here## [1] 50005e3 #notice the lack of minus hereTo call a function, we simply type its name, followed by open and closing parentheses.
Anything we type inside the parentheses is called the function’s arguments
sin(1) #trig functions## [1] 0.841471log(1) # natural log## [1] 0log10(10) #base-10 log## [1] 1exp(0.5) # e^(1/2)## [1] 1.648721notice the use of the # after, any idea what this does? this doesn’t get evaluated b/c it’s a comment, use this to document or leave notes for yourself, e.g. #TODO fix code
use RSudio’s autocompletion feature if you can remember beginning of function
? before a function brings up help page in Rstudio help panelwe’ll look at help later on
1 == 1 # equality (note two equals signs, read as "is equal to")## [1] TRUE1 != 2  # inequality (read as "is not equal to")## [1] TRUE1 <  2  # less than## [1] TRUE1 <= 1  # less than or equal to## [1] TRUE1 > 0  # greater than## [1] TRUE1 >= -9 # greater than or equal to## [1] TRUE# Tip: dont' use == to compare numbers unless integers, computers represent decimals with a certain degree of precision
# check out ?all.equal for comparing things involving doubles
0.1+0.05==0.15## [1] FALSEall.equal(0.1+0.05, 0.15)## [1] TRUE0.1+0.05==0.15all.equal(0.1+0.05, 0.15)<-, like x <- 1/40x <- 1/40Notice that assignment does not print a value.
Look in Environment tab in Rstudio
x## [1] 0.025decimal approximation of this fraction called a floating point number.
our var can be used in place of a number in calculations log(x)
log(x)## [1] -3.688879x<-100x <- 100x## [1] 100x used to contain the value 0.025 and and now it has the value 100.
x <- x + 1 #notice how RStudio updates its description of x on the top right tab
y <- x * 2x## [1] 101y## [1] 202x <- x + 1 #notice how RStudio updates its description of x on the top right tab
right hand side of assignment can be any valid R expression & is evaluated prior assigment
Can use = for assignment but less common among R users
be consisten with operator usage, <- is more common and recommended
Challenge 1 (5 mins)
1:5## [1] 1 2 3 4 52^(1:5)## [1]  2  4  8 16 32x <- 1:5
2^x## [1]  2  4  8 16 32Challenge 2 & 3 (5 mins)
ls() lists all varialbles and fucntions stored in the R golbal environmentls()## [1] "x" "y"ls along will print out code for that function (or any R function)ls## function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, 
##     pattern, sorted = TRUE) 
## {
##     if (!missing(name)) {
##         pos <- tryCatch(name, error = function(e) e)
##         if (inherits(pos, "error")) {
##             name <- substitute(name)
##             if (!is.character(name)) 
##                 name <- deparse(name)
##             warning(gettextf("%s converted to character string", 
##                 sQuote(name)), domain = NA)
##             pos <- name
##         }
##     }
##     all.names <- .Internal(ls(envir, all.names, sorted))
##     if (!missing(pattern)) {
##         if ((ll <- length(grep("[", pattern, fixed = TRUE))) && 
##             ll != length(grep("]", pattern, fixed = TRUE))) {
##             if (pattern == "[") {
##                 pattern <- "\\["
##                 warning("replaced regular expression pattern '[' by  '\\\\['")
##             }
##             else if (length(grep("[^\\\\]\\[<-", pattern))) {
##                 pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
##                 warning("replaced '[<-' by '\\\\[<-' in regular expression pattern")
##             }
##         }
##         grep(pattern, all.names, value = TRUE)
##     }
##     else all.names
## }
## <bytecode: 0x7fed8d828040>
## <environment: namespace:base>when using ls parenthasis are importnat to tell R to call the function ls
Remove objects use rm(x)
rm(x)You can use rm to delete objects you no longer need
If you have a lot of objects and want to delete all, use rm(list=ls())
rm(list=ls())In this case we are using ls() function inside another function that takes a list argument, so we are listing all objects and then deleting them with rm()
arguments need the =, not <- rm(list <- ls()) causes errors
run once: rm(list <- ls())
Challenge 4
installed.packages() #list packages
install.packages("packagename1", "packagename2") #install one or many packages
update.pakcages() updating packages
remove.packages("packagename")
library(packagename) #make package available Challenge 5
The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.
A good project layout will ultimately make your life easier:
It will help ensure the integrity of your data; It makes it simpler to share your code with someone else (a lab-mate, collaborator, or supervisor); It allows you to easily upload your code with your manuscript submission; It makes it easier to pick the project back up after a break.
we’ll look a litle more at Data Management near the end of quarter.
http://swcarpentry.github.io/r-novice-gapminder/fig/bad_layout.png
Creating a project in RStudio
We’re going to create a new project in RStudio:
- Click the “File” menu button, then “New Project”.
- Click “New Directory”.
- Click “Empty Project”.
- Type in the name of the directory to store your project, e.g. “swc_ucla”.
- If available, select the checkbox for “Create a git repository.” (We’ll come back to this tomorrow)
- Click the “Create Project” button.
Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:
Data is typically time consuming and/or expensive to collect.
Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how it has been modified since collection.
It is therefore a good idea to treat your data as “read-only”.
This task is sometimes called “data munging”.
it useful to store these scripts in a separate folder, and create a second “read-only” data folder to hold the “cleaned” data sets.
There are lots of different ways to manage this output. its useful to have an output folder with different sub-directories for each separate analysis.
This makes it easier later, as many of my analyses are exploratory and don’t end up being used in the final project, and some of the analyses get shared between projects.
gives the following recommendations for project organization:
doc directory.data directory, and files generated during cleanup and analysis in a results directory.src directory, and programs brought in from elsewhere or compiled locally in the bin directory.Name all files to reflect their content or function.
mention TIER Protocol - covered later on in the Data Management section
Challenge 1
To be able to access R help files for functions and operators.
?function_name or help(function_name)?"+"??function_name will do fuzzy search for function help (if you don’t know the exact name)** hope to be at 2:00pm**