Find us on GitHub

Intro to Data Management - School of Global Policy and Strategy

Feb 16 - Mar 4, 2020

12:30 pm - 1:50 pm

Instructors: Mary Linn Bergstrom (Library), Reid Otsuji (Library), Chi Gao(GPS)

Helpers: Kimberly Thomas, Harry Zhao, Rick Mccosh, Stephanie Labou, TA - Chi Gao

General Information

This is the website for the School of Global Policy and Strategy short course in data management. This course will introduce you best practices in data management, preparation and representation, including the basics of data in spreadsheets, SQL and using the Unix Shell. In order to earn a certificate of proficiency, you must attend all of the class meetings for the course, do the short assignments, and pass the quizzes.

Who: The course is aimed at GPS graduate students. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Zoom Meeting, 9500 Gilman Drive, #0519, La Jolla, CA. Zoom - link and access information in Canvas

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail rotsuji@ucsd.edu for more information.

Assessment: A certificate of proficiency will be given to students who:

  • Attend/view all classes
  • Passes quizzes
  • Satisfactorily completes the weekly assignments

Data:

  • Download the survey.db and put it on your desktop inside a folder labeled "sql" .

Need help?:


Credits: The course materials are adpated from the Software Carpentry lessons. Many thanks to the authors of those lessons.

Schedule

schedule
Date Class Topics
Feb. 16 Lecture Basic Data Management (Lecture), Open Science Framework
Feb. 18 Lecture Organization in Spreadsheets
Assignment & Quiz Posted in Canvas
Feb. 23 Lecture Introduction to SQL
Feb. 25 Lecture TBD - SQL Cont.
Assignment & Quiz Posted in Canvas
Mar. 02 Lecture TBD - Introduction to the UNIX Shell
Mar. 04 Lecture TBD - Intro to APIs
Final Quiz Posted in Canvas

HackMD: https://hackmd.io/i49hTFHZTf-z4qNVKQOMuQ.
We will use this HackMD for collaborative notes, sharing URLs and bits of code.


Syllabus

Managing Data with SQL

  • Reading and sorting data
  • Filtering with where
  • Calculating new values on the fly
  • Handling missing values
  • Combining values using aggregation
  • Combining information from multiple tables using join
  • Creating, modifying, and deleting data
  • Programming with databases
  • Reference...

Resources

resources

Data Management Best Practices

  • TIER Protocol - A defined protocol for structuring and managing code, data and output of your resercher
  • RDCP - Library unit who can help you with your research data needs
  • Workflow for Data Analysis using Stata by Scott Long - A book that has informed many efforts to produce reproducible workflows in Stata

Unix/Bash

Setup

To participate in this course, you will need access to the software described below. In addition, you will need an up-to-date web browser.

The Bash Shell

Bash is a commonly-used shell that gives you the power to do simple tasks more quickly.

Windows

  1. Download the Git for Windows installer.
  2. Run the installer and follow the steps bellow:
    1. Click on "Next".
    2. Click on "Next".
    3. Click on "Next".
    4. Click on "Next".
    5. Click on "Next".
    6. Select "Use Git from the Windows Command Prompt" and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
    7. Click on "Next". Keep "Checkout Windows-style, commit Unix-style line endings" selected.
    8. Select "Use Windows' default console window" and click on "Next".
    9. Click on "Next".
    10. Click on "Finish".

This will provide you with both Git and Bash in the Git Bash program.

Mac OS X

The default shell in all versions of Mac OS X is Bash, so no need to install anything. You access Bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.

Linux

The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing bash. There is no need to install anything.

SQLite

SQL is a specialized programming language used with databases. We use a simple database manager called SQLite in our lessons. We will use SQLite Manager plugin for Firefox. If you don't have Firefox installed, you need to instal it first and then you will be able to add the plugin.

Windows

Pre-compiled SQLite Binaries for Windows. Download Install sqlite-dll-win64-x64-3310100.zip

Mac OS X

SQLite comes pre-installed on Mac OS X.

Linux

SQLite comes pre-installed on Linux.