How can I manipulate a dataframe?
cats <- read.csv(file="data/feline-data.csv")
## Warning in read.table(file = file, header = header, sep = sep, quote =
## quote, : incomplete final line found by readTableHeader on 'data/feline-
## data.csv'
## coat weight likes_string
## 1 calico 2.1 1
## 2 black 5.0 0
## 3 tabby 3.2 1
cats <- cbind(cats, age)
cats
## coat weight likes_string
## 1 calico 2.1 1
## 2 black 5.0 0
## 3 tabby 3.2 1
age <- c(4,5,8)
cats <- cbind(cats, age)
cats
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
Now how about adding rows - in this case, we saw last time that the rows of a data.frame are made of lists:
newRow <- list("tortoiseshell", 3.3, TRUE, 9)
cats <- rbind(cats, newRow)
## Warning in `[<-.factor`(`*tmp*`, ri, value = structure(c(2L, 1L, 3L,
## NA), .Label = c("black", : invalid factor level, NA generated
levels(cats$coat)
## [1] "black" "calico" "tabby"
levels(cats$coat) <- c(levels(cats$coat), 'tortoiseshell')
cats <- rbind(cats, list("tortoiseshell", 3.3, TRUE, 9))
str(cats)
## 'data.frame': 5 obs. of 4 variables:
## $ coat : Factor w/ 4 levels "black","calico",..: 2 1 3 NA 4
## $ weight : num 2.1 5 3.2 3.3 3.3
## $ likes_string: int 1 0 1 1 1
## $ age : num 4 5 8 9 9
cats$coat <- as.character(cats$coat)
str(cats)
## 'data.frame': 5 obs. of 4 variables:
## $ coat : chr "calico" "black" "tabby" NA ...
## $ weight : num 2.1 5 3.2 3.3 3.3
## $ likes_string: int 1 0 1 1 1
## $ age : num 4 5 8 9 9
cats
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
## 4 <NA> 3.3 1 9
## 5 tortoiseshell 3.3 1 9
cats[-4,]
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
## 5 tortoiseshell 3.3 1 9
na.omit(cats)
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
## 5 tortoiseshell 3.3 1 9
cats <- na.omit(cats)
cats <- rbind(cats, cats)
cats
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
## 5 tortoiseshell 3.3 1 9
## 11 calico 2.1 1 4
## 21 black 5.0 0 5
## 31 tabby 3.2 1 8
## 51 tortoiseshell 3.3 1 9
rownames(cats) <- NULL
cats
## coat weight likes_string age
## 1 calico 2.1 1 4
## 2 black 5.0 0 5
## 3 tabby 3.2 1 8
## 4 tortoiseshell 3.3 1 9
## 5 calico 2.1 1 4
## 6 black 5.0 0 5
## 7 tabby 3.2 1 8
## 8 tortoiseshell 3.3 1 9
http://swcarpentry.github.io/r-novice-gapminder/05-data-structures-part2/#challenge-1
So far:
data.frames
with our cat datagapminder <- read.csv("data/gapminder-FiveYearData.csv")
download.file
and the read.csv can be executed to read the downloaded file such as:download.file("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv", destfile = "data/gapminder-FiveYearData.csv")
gapminder <- read.csv("data/gapminder-FiveYearData.csv")
gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")
str(gapminder)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
typeof(gapminder$year)
## [1] "integer"
typeof(gapminder$lifeExp)
## [1] "double"
typeof(gapminder$country)
## [1] "integer"
str(gapminder$country)
## Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
length(gapminder)
## [1] 6
typeof(gapminder)
## [1] "list"
nrow(gapminder)
## [1] 1704
ncol(gapminder)
## [1] 6
dim(gapminder)
## [1] 1704 6
colnames(gapminder)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
if not we need to sort out problems now before they turn into negative surprises down the road
once we are happy that the data types and structures seem reasonable, it’s time to start digging into our data properly
head(gapminder)
## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.801 779.4453
## 2 Afghanistan 1957 9240934 Asia 30.332 820.8530
## 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
## 4 Afghanistan 1967 11537966 Asia 34.020 836.1971
## 5 Afghanistan 1972 13079460 Asia 36.088 739.9811
## 6 Afghanistan 1977 14880372 Asia 38.438 786.1134