Find us on GitHub

Data Management 101 - School of Global Policy and Strategy

Feb 21- Mar 5, 2018

6:30 pm -9:20 pm

Instructors: Mary Linn Bergstrom, Reid Otsuji

Helpers: Kay lyu

General Information

This is the website for the School of Global Policy and Strategy short course in data management and SQL. This course will introduce you best practices in data management, preparation and representation, including the basics of SQL. In order to earn a certificate of proficiency, you must attend all of the class meetings for the course, do the short coding assignments, and pass the quizzes. Courses begin on Feb. 21.

Who: The course is aimed at GPS graduate students. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Robinson Auditorium, 9500 Gilman Drive, #0519, La Jolla, CA. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail rotsuji@ucsd.edu for more information.

Assessment: A certificate of proficiency will be given to students who:

  • Attends all classes
  • Passes quizzes
  • Satisfactorily completes the weekly assignments

Data:

  • Dowload the SN7577 and put it on your desktop inside a sql folder.

Need help?: Email rotsuji@ucsd.edu or schedule an appointment


Schedule

schedule
Date Class Topics
Feb. 21 Lecture Basic Data Management (Lecture), Open Science Framework and TIER Protocol
Feb. 26 Lecture SQL for Social Science
Mar. 05 Lecture Unix Shell
Final Quiz Posted in TritonED

Etherpad: https://public.etherpad-mozilla.org/p/2018-gps-dm.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

The Unix Shell

  • Files and directories
  • History and tab completion
  • Pipes and redirection
  • Looping over files
  • Creating and running shell scripts
  • Finding things
  • Reference...

Managing Data with SQL

  • Reading and sorting data
  • Filtering with where
  • Calculating new values on the fly
  • Handling missing values
  • Combining values using aggregation
  • Combining information from multiple tables using join
  • Creating, modifying, and deleting data
  • Programming with databases
  • Reference...

Resources

resources

Data Management Best Practices

  • TIER Protocol - A defined protocol for structuring and managing code, data and output of your resercher
  • RDCP - Library unit who can help you with your research data needs
  • Workflow for Data Analysis using Stata by Scott Long - A book that has informed many efforts to produce reproducible workflows in Stata

Unix/Bash

Setup

To participate in this course, you will need access to the software described below. In addition, you will need an up-to-date web browser.

The Bash Shell

Bash is a commonly-used shell that gives you the power to do simple tasks more quickly.

Windows

  1. Download the Git for Windows installer.
  2. Run the installer and follow the steps bellow:
    1. Click on "Next".
    2. Click on "Next".
    3. Click on "Next".
    4. Click on "Next".
    5. Click on "Next".
    6. Select "Use Git from the Windows Command Prompt" and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
    7. Click on "Next". Keep "Checkout Windows-style, commit Unix-style line endings" selected.
    8. Select "Use Windows' default console window" and click on "Next".
    9. Click on "Next".
    10. Click on "Finish".

This will provide you with both Git and Bash in the Git Bash program.

Mac OS X

The default shell in all versions of Mac OS X is Bash, so no need to install anything. You access Bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.

Linux

The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing bash. There is no need to install anything.