SIO -Software Carpentry Etherpad! This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. Users are expected to follow our code of conduct: http://software-carpentry.org/conduct.html All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ ############## Course Syllabus and setup instructions: https://ucsdlib.github.io/2016-09-19-UCSD-SIO/ This Etherpad: http://pad.software-carpentry.org/2016-09-19-UCSDSIO *Instructors: *Matt Critchlow - Git *Tim Dennis - Python *Reid Otsuji - Unix Shell * * *Helpers: * Day 1: Adi Ranganath, Tim Dennis, Matt Critchlow * Day 2: Tim Dennis, Reid Otsuji, Ryan Johnson Download Data here: We will be saving the data folder on the deskop open Terminal or Gitbash and change diretory to your desktop type: $ CD desktop $ git clone http://github.com/ucsdlib/sio-swc the data files wll down load to a sio-swc folder on your desktop. PLEASE SIGN IN HERE: Tim Dennis, UCSD Library, timdennis@ucsd.edu Reid Otsuji, UCSD Library, rotsuji@ucsd.edu Matt Critchlow, UCSD Library, mcritchlow@ucsd.edu Claire Mizumoto, Research IT, claire@ucsd.edu Cyd Burrows, Research IT, cburrows@ucsd.edu Sara Rivera, SIO, s6rivera@ucsd.edu Jessica Blanton, SIO jmblanton@ucsd.edu Sascha Nicklisch, SIO, snicklisch@ucsd.edu Chris Leber, SIO, cleber@ucsd.edu Logan Peoples, SIO, lpeoples@ucsd.edu Sarah R. Smith, SIO, smith8272@gmail.com Jack Pan, SIO, b4pan@ucsd.edu Jon Tarn jon.sy.tarn@gmail.com Lena Keller l1keller@ucsd.edu Marisa Trego marisa.trego@noaa.gov Suzanne Roden suzanne.roden@noaa.gov Jens Muhle, jmuhle@ucsd.edu Maryam Asgari Lamjiri, masgaril@ucsd.edu Bia Villas Boas avillasboas@ucsd.edu Marcy Erb, UCSD, Biological Sciences, merb@ucsd.edu Mike Raiko, UCSD CSE, mraiko@ucsd.edu Martin Gassmann, SIO, mgassmann@ucsd.edu Tessa Pierce, SIO, ntpierce@ucsd.edu ############### To get set up: cd desktop git clone http://github.com/ucsdlib/sio-swc ############### Day 1 Automating tasks with the Unix Shell # sign is a comment in unix shell $ whoami #tells you who current user you are logged in as (v. helpful on remote machines where you might have many users that you can use) #pwd # "print working directory" = tells you what folder you're in $ ls # lists files $ ls -F #tells unix to distinguish files and directories by add a '/' after directories ls --help # get help info for the ls command ##in unix/mac/linux (doesn't work on gitbash): man ls # display manual for ls command # hit q afterwards to return to command prompt Google 'man ls unix' and you'll get to the man page: http://man7.org/linux/man-pages/man1/ls.1.html cd # command to change directory cd desktop # change into desktop folder cd sio-swc cd data-shell #how to get back to previous folders? cd .. # move back up one folder (to parent directory) ls -F -a #use ls to display hidden directories ('-a' means "show all") You can also enable colors in the shell permanently, e.g.: Edit: ~/.bash_profile * or * ~/.profile * and add the following two lines: * export CLICOLOR=1 export LSCOLORS=ExFxCxDxBxegedabagacad * you can use this if you are using a black background: export LSCOLORS=gxBxhxDxfxhxhxhxhxcxcx * '.' #is current worksing directory '..' # is parent directory cd #cd and space will take you home! clear # will clear the screen ctr-l #will clear screen # use your 'up' arrow key to see your previous commands. Press 'enter' to reexecute a command. cd ~ # takes you home as well '~' evaluates to the path to your home directory use to autocomplete - start typing the first few letters of a command and hit tab Using tab also helps avoiding typos! $ mkdir thesis # will create a directory with the name thesis ## recommended to not have spaces in the folder names in unix, if there are spaces in the name, you can quote the file name cd "my spaced name" For example, mkdir "test directory" is possible, but then you always have to use "test directory", e.g. cd "test directory". Using test-directory or test_directory is better. nano: ctr-o #write the file to disk(save) ctr-x #exits the file rm #removes files, but will not remove a directory $rm thesis/draft.txt $rmdir thesis #removes a empty folder To get nano: https://github.com/swcarpentry/windows-installer/releases/tag/v0.3 $ mv #command will move files or directories $ cp # copies files or folders command wc (word count) $ wc -l *.pdb > lengths.txt # '>' redirects the standard output to the file $ sort -n lenghts.txt # sorts numerically with -n $ head # will read first few lines of a file or input Two or more commands can be chained ("piped") together, using "|" pipe. This is very powerful and avoids the creating of temporary files. wc -l *.pdb | sort -n | head -1 The * and the ? mark are special characters. For example, ls *.pdb lists all files which end with .pdb. * stands for multiple characters, ? for one character. $ wc -l *.pdb | sort -n | head -3 # wordcount for lines in all pdb files sorted by number and show only the top 3 to get second column of a file you can use teh cut command $cut -d "," -f 2 animals.txt | sort -r # would grab the second column in the file, awk is also a tool in unix that is designed for operating on csv type data For loop in unix: > for filename in file.txt file2.txt do head -n 3 $filename done $ for filename in *.dat $ do cp $filename original-$filename $ done For more information about UNIX commands check out : www.explainshell.com ################## Python Day 1 Notes ############## You can check your python version in the command shell: python --version To run the notebook: jupyter notebook load numpy by typing: import numpy *data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') print(data) weight_kg = 55 print(weight_kg) print('weight in kilograms is not', weight_kg) weight_kg * 2 weight_lb = weight_kg * 2.2 print("weight in lb", weight_lb) print("weight in lb:", weight_kg * 2.2) whos Challenge: What does the following program print out? *first, second = 'Grace', 'Hopper' *third, fourth = second, first *print(third, fourth) answer: Hopper Grace Make sure you have your data setup. run: data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') print(data) print(type(data)) print(data.shape) print('first value in data', data[0,0]) print('4th value in data', data data[0,3]) Python starts counting rows and columns at 0 (many other programming languages start at 1). Python addresses row, column slice array data[0:4, 0:10] data[:3, 36:] Indexing exercise: element = 'oxygen' print('first three characters:', element[0:3]) print(element[:]) print(element[-1]) print(element[-2]) print(element[1:-1]) print(element[1:5]) # functions are deonted with brackets # attributes are denoted with parentheses doubledata = data * 2.0 print(doubledata) tripledata = doubledata+data print(tripledata) print(data.mean()) print('maximum inflammation:', data.max()) print('minimum inflammation:', data.min()) print('standard deviation', data.std()) # visualizing data with matplotlib # https://en.wikipedia.org/wiki/Matplotlib %matplotlib inline import matplotlib.pyplot as plt # make plot plt.imshow(data) #save plot image = plt.imshow(data) plt.savefig('heatmap.jpg') avg_inflm = data.mean(axis=0) print(avg_inflm.shape) # mean across axis 1 (row) # access means down columns (down axis 0) print(data.mean(axis=0)) # access means down columns (down axis 1) print(data.mean(axis=1)) # see contents of file (as in shell) !cat inflammation-01.csv # plot inflammation levels (by day) day_avg_plot = plt.plot(avg_inflm) #create plot showing standard deviation (numpy.std) of the inflammation data for each day across all patients std_plot= plt.plot(data.std(axis=0)) # remember this is an object-oriented package # looping: word = 'lead' for char in word: print(char) # loop challenge for num in range(1,4): print num # loop challenge 2: wtire a loop that take a string and produces a new string with the characters in reverse order so 'Newton' becomes 'notweN' newstring='' oldstring='Newton' length_old = len(oldstring) for char_index in range(length_old): newstring = newstring+oldstring[length_old - char_index -1] print(newstring) # Strings are immutable (individual elements are unchangable) name = 'Bell' name[0]= 'b' # Unlike strings, lists are mutable, elements can be changed, thus are also subject to problems (make a copy of your data!) names=["Newton", "Darwing", "Turing"] print('names is originally:', names) names[1]='Darwin' print('final value of names:', names) print(odds) odds.append(11) print(odds) odds.reverse() print(odds) odds.remove(11) print(odds) odds = [1,3,5,7] primes = odds primes += [2] print('primes:', primes) print('odds:', odds) odds = [1,3,5,7] primes = list(odds) primes += [2] print('primes:', primes) print('odds:', odds) # List challenge: Use a for-loop to convert the string "hello" into a list of letters: #Hint: you can create an empty list like this: # my_list = [] my_list=[] word="hello" for char in word: my_list += char print(my_list) # non continuous slicing: # indexing: Rangestart:Rangeend:intervaljump primes = [2,3,5,7,11,13,17,19,23,31,37] subset = primes[0:12:3] print("subset", subset) # slice challenge: using # beatles = "In an octopus's garden in the shade" # slice to "I notpussgre ntesae" beatles = "In an octopus's garden in the shade" beatles[0:len(beatles):2] # or beatles[::2] _____________________ Working with files_____________________ import glob import numpy # .glob returns list of files (better than os.listdir, uses wildcards) print(glob.glob('inflammation*.csv')) counter = 0 for filename in glob.glob('*.csv'): counter = counter +1 # aka counter += 1 print("number of files:", counter) counter = 0 for filename in glob.glob('infla*'): counter = counter +1 # aka counter += 1 print("number of files:", counter) # help documentation help(numpy) # making choices using conditional expressions num = 37 if num >100: print("greater") else: print("not greater") print('done') # conditionals do not have to have "else" statements num = 53 print('before conditional...') if num > 100: print(num, 'is greater than 100') print('...after conditional') num = -3 if num > 0: print(num, "is positive") elif num ==0: print(num, "is zero") else: print(num, "is negative") if (1>0) and (2>0): print('both parts are true') else: print('at least one test is not true') # Choices challenge which of the following would be printed if you were to run this code? why did you pick this answer? #A #B #C #B and C x=4 y=5 if x>y: print('A') elif x==y: print ('B') elif x