Introduction to Text Mining in R

November 16, 2016
posts   R

Introduction

This workshop will introduce you to text analysis techniques in R, an open source programming language. Some familiarity with R is expected as a requirement to attend this course. You can attend the Intro to R course we provide or take this online tutorial https://www.datacamp.com/courses/free-introduction-to-r from DataCamp.

Topics

Dates

November 16, 2016 (9:00 AM - 12:00 PM)

Location

Biomedical Library Classroom 4

Things not covered

Instructors

Audience

All graduate students and researchers.

Setup

This lesson assumes you have the R, RStudio software installed on your computer.

R can be downloaded here.

We will use the following packages in R, if you can, install prior to class:

RStudio is an environment for developing using R. It can be downloaded here. You will need the Desktop version for your computer.

Required R Packages

  1. tm # text mining in R
  2. RTextTools # a machine learning package for text classification
  3. qdap # quantiative discourse analysis
  4. qdapDictionaries # for sentiment analysis, etc
  5. entropy # tools applying Information Theory
  6. dplyr # data preparation and pipes $>$
  7. ggplot2 # for plotting
  8. SnowballC # for stemming
  9. matrixStats # for stats
  10. data.table # for easier data manipulation
  11. scales # to help us plot
  12. lsa # latent semantic analysis
  13. cluster # for clustering analysis
  14. fpc # flexible procedures for clustering
  15. mallet # a wrapper around the Java machine learning tool MALLET
  16. wordcloud # to visualize wordclouds
  17. rJava # dependency for mallet
  18. Any dependencies to the packages above.

Data

https://drive.google.com/open?id=0ByRar-ghNtRlNGpENWJmNGNlS2s

Resources

Credits