Course intro (5- 10 mins)

shoot for 1hr

SETUP (5 min)

SETUP DATA

INTRO TO RSTUDIO (10 min)

BASIC LAYOUT: RStudio
  • When you open RStudio, you will be greeted by three panels:
    • The interactive R console (entire left)
    • Workspace/History (tabbed in upper right)
    • files/Plots/Packages/Help (tabbed in lower right)
  • Walk through each panel on screen
KEYBOARD SHORTCUTS - Running segments of your code
  • Shortcuts: Tools>Keyboard shortcut help

Alt+- inserts <- at cursor Control+Shift+m inserts %>% at cursor (we’ll get into thisfor day 2) Control+Enter = Run current line/selection ctrl+1 = Move cursor to source ctrl+2 = Move cursor to console

Intro to R (15)

Much of your time in R will be spent in the R interactive console.

This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file.

R as calculator in the script window - simple use

* much time spent in console working out code 
* console `>` with blinking cursor, much like command line
* "read, evaluate, print, loop" REPL - many languages adopt this paradigm (bash, stata, python)
* R tries to execute them, and then returns a result
1 + 100
## [1] 101

Using R as a calculator: R uses same order of opterations to lowest precedence:

  1. Parentheses: ( )
  2. Exponents: ^ or **
  3. Divide: /
  4. Multiply: *
  5. Add: +
  6. Subtract: -

3+5*2 v. (3+5) * 2

3+5*2
## [1] 13

vs. Use parentheses to group operations in order to force the order of evaluation if it differs from the default, or to make clear what you intend.

(3+5) * 2
## [1] 16
(3 + (5 * (2 ^ 2))) # hard to read
## [1] 23
3 + 5 * 2 ^ 2       # clear, if you remember the rules
## [1] 23
3 + 5 * (2 ^ 2)     # if you forget some rules, this might help
## [1] 23
2/10000
## [1] 2e-04
5e3 # Note the lack of minus here
## [1] 5000

Math functions - many built in functions

sin(1) #trig functions
## [1] 0.841471
log(1) # natural log
## [1] 0
log10(10) #base-10 log
## [1] 1
exp(0.5) # e^(1/2)
## [1] 1.648721
Commenting
  • notice the use of the # after, any idea what this does? this doesn’t get evaluated b/c it’s a comment, use this to document or leave notes for yourself, e.g. #TODO fix code

  • Dont worry about remembering functions, use google
  • use RSudio’s autocompletion feature if you can remember beginning of function

  • Typing ? before a function brings up help page in Rstudio help panel
  • we’ll look at help later on

Comparing things

1 == 1 # equality (note two equals signs, read as "is equal to")
## [1] TRUE
1 != 2  # inequality (read as "is not equal to")
## [1] TRUE
1 <  2  # less than
## [1] TRUE
1 <= 1  # less than or equal to
## [1] TRUE
1 > 0  # greater than
## [1] TRUE
1 >= -9 # greater than or equal to
## [1] TRUE
# Tip: dont' use == to compare numbers unless integers, computers represent decimals with a certain degree of precision
# check out ?all.equal for comparing things involving doubles

0.1+0.05==0.15
## [1] FALSE
all.equal(0.1+0.05, 0.15)
## [1] TRUE

Variable & assignments

x <- 1/40
x
## [1] 0.025
log(x)
## [1] -3.688879
x <- 100
x
## [1] 100

x used to contain the value 0.025 and and now it has the value 100.

x <- x + 1 #notice how RStudio updates its description of x on the top right tab
y <- x * 2
x
## [1] 101
y
## [1] 202

Challenge 1 (5 mins)

Vectorization

1:5
## [1] 1 2 3 4 5
2^(1:5)
## [1]  2  4  8 16 32
x <- 1:5
2^x
## [1]  2  4  8 16 32

Challenge 2 & 3 (5 mins)

Managing your enviorment

ls()
## [1] "x" "y"
ls
## function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, 
##     pattern, sorted = TRUE) 
## {
##     if (!missing(name)) {
##         pos <- tryCatch(name, error = function(e) e)
##         if (inherits(pos, "error")) {
##             name <- substitute(name)
##             if (!is.character(name)) 
##                 name <- deparse(name)
##             warning(gettextf("%s converted to character string", 
##                 sQuote(name)), domain = NA)
##             pos <- name
##         }
##     }
##     all.names <- .Internal(ls(envir, all.names, sorted))
##     if (!missing(pattern)) {
##         if ((ll <- length(grep("[", pattern, fixed = TRUE))) && 
##             ll != length(grep("]", pattern, fixed = TRUE))) {
##             if (pattern == "[") {
##                 pattern <- "\\["
##                 warning("replaced regular expression pattern '[' by  '\\\\['")
##             }
##             else if (length(grep("[^\\\\]\\[<-", pattern))) {
##                 pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
##                 warning("replaced '[<-' by '\\\\[<-' in regular expression pattern")
##             }
##         }
##         grep(pattern, all.names, value = TRUE)
##     }
##     else all.names
## }
## <bytecode: 0x7fed8d828040>
## <environment: namespace:base>
rm(x)
rm(list=ls())

run once: rm(list <- ls())

Challenge 4

Installing and Using R packages

installed.packages() #list packages

install.packages("packagename1", "packagename2") #install one or many packages

update.pakcages() updating packages

remove.packages("packagename")

library(packagename) #make package available 

Challenge 5

Project management with RStudio (10 min)

The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.

A good project layout will ultimately make your life easier:

It will help ensure the integrity of your data; It makes it simpler to share your code with someone else (a lab-mate, collaborator, or supervisor); It allows you to easily upload your code with your manuscript submission; It makes it easier to pick the project back up after a break.

Creating a project in RStudio

We’re going to create a new project in RStudio:

  1. Click the “File” menu button, then “New Project”.
  2. Click “New Directory”.
  3. Click “Empty Project”.
  4. Type in the name of the directory to store your project, e.g. “swc_ucla”.
  5. If available, select the checkbox for “Create a git repository.” (We’ll come back to this tomorrow)
  6. Click the “Create Project” button.

Best practices for project organization

Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:

  • Treat data as read only This is probably the most important goal of setting up a project.

Data is typically time consuming and/or expensive to collect.

Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how it has been modified since collection.

It is therefore a good idea to treat your data as “read-only”.

  • Data Cleaning In many cases your data will be “dirty”: it will need significant preprocessing to get into a format R (or any other programming language) will find useful.

This task is sometimes called “data munging”.

it useful to store these scripts in a separate folder, and create a second “read-only” data folder to hold the “cleaned” data sets.

  • Treat generated output as disposable Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts.

There are lots of different ways to manage this output. its useful to have an output folder with different sub-directories for each separate analysis.

This makes it easier later, as many of my analyses are exploratory and don’t end up being used in the final project, and some of the analyses get shared between projects.

Good enough practices for scientific computing

https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf

gives the following recommendations for project organization:

  • Put each project in its own directory, which is named after the project.
  • Put text documents associated with the project in the doc directory.
  • Put raw data and metadata in the data directory, and files generated during cleanup and analysis in a results directory.
  • Put source for the project’s scripts and programs in the src directory, and programs brought in from elsewhere or compiled locally in the bin directory.
  • Name all files to reflect their content or function.

  • mention TIER Protocol - covered later on in the Data Management section

Challenge 1

Seeking help (5min)

To be able to access R help files for functions and operators.

** hope to be at 2:00pm**