This course´s goal is to provide and advanced training in computer usage (mainly the command-line interface, cli) for large-scale data analysis. At the end of the course, students from biologically oriented backgrounds should be able to use the cli to view, edit, manipulate and summarize large data files, successfully extracting biological information and insight from the high-throughput analyses that generated those files.
- The importance of the command-line interface (cli) in large-scale research. - Introduction to unix-like systems, basic architecture, directory structure, shell. - Remote system access and usage (graphical and command line interfaces). - User/group and permission model - who can access what and where? - File types and manipulation 9copy, move, rename, append, data streams, directories, etc.) - Main cli bioinformatics tools and bioinformatic file formats - Basic programming (bash, r) for automation of cli tasks, file manipulation and analyses.
1. Introduction to computing, computers and the Unix family of operating systems. The graphical (gui) and command-line (cli) user interfaces. Why use the cli? Accessing the shell (bash), locally or remotely (ssh) and bash basics. 2. Finding and executing programs, moving around the directory tree (CD). Navigating/understanding the system (memory, disk space, etc.). Getting help with man, info, apropos, and internet search engines. 3. System structure. File Types, directories. User and group permission model. 4. Standard streams and redirection. Piping. 5. Finding and manipulating files and directories (create, delete, move, copy, rename, append, concatenate, etc.).Compressing and decompressing data (tar, gz, zip, etc). Basic regular expressions. 6. Describing and summarizing file content (wc, file). Changing file acess (owner, group, permissions). Getting data into the system (wget, scp, ftp). 7. Exploring (head, tail, more, less, cat, zcat, uniq,etc), sub setting (grep, cut), editing (nano, sed, paste), comparing (diff, comm), and sorting (sort) file content. 8. Compiling third-party programs. 9. Automating the cli with basic bash scripting. 10. Exploring data using basic r.
- "The Linux Command Line, a Complete Introduction" by William E. Shotts jr. (2013) - Freely available at http://linuxcommand.org) - "Ubuntu Pocket Guide and Reference" , by Keir Thomas (2009), chapter 5 ( freely available at http://ubuntupocketguide.com) - "Introduction to the Command Line", by the Free Software Foundation (2013) (freely available at http://en.flossmanuals.net/command-line/) - "Bash Guide for Beginners", by Machtelt Garrels (2008)(Freely available at http://ww.tldp.org/ldf/bash-beginners-guide/bash-begginers-guide.pdf)