Theory: Thursday 14:00-15:30, Room 3-114 in South building
Classes begin: Thursday, September 15, 2022
A good introduction: Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667.
For metagenomics: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (The National Academies Press, 2007, FREE)
For general bioinformatics: Lengauer: Bioinformatics – from genomes to therapies.. Vol.1-3 (Wiley, 2007)
Isaev A. Introduction to mathematical methods in bioinformatics (Springer, 2006)
Handouts, course materials:
Public links and references: course materials:
For lecture 8: Chapter 5 of the Lengauer-book: David C. Kulp: Finding Protein-coding Genes
For lecture 9: Rabiner’s tutorial: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf , Chapter 3 of Isaev’s book.
In-house electronic handouts for our students (this is a password protected area, passwords are circulated in the class).
1, What is covered in this course, and why? Brief introduction. Genome sequencing techniques: Sanger, 454/Roche, Illumina/Solexa The FASTA and FASTQ formats. Introduction to metagenomics. Microbial diversity.
Part 1: Genome sequencing
2, Levels of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace. Re-sequencing vs. de novo sequencing. Assembly of short reads.
Strategies: Hashing. Hashing in parallel (example with the element distinctness problem).
3, The Burrows-Wheeler transform. Easy substring-search, computing its inverse. Sequence assembly with graphs. The Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs
Part 2: Sequence analysis
4., Strings, basic problems related to string search: pattern matching, alignments.Brief introduction on data structures.
Two approaches for pattern matching: preprocessing the text or the searched pattern.Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs.
Dynamic programming: main ideas and applicability to sequence comparisons. “Scoring functions” and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints
5., Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees and suffix arrays. Quickly solvable tasks using Suffix trees.
Efficient construction of suffix arrays: radix quicksort, (sketch: Sadakane-algorithm, linear method). Live demonstration of comparing the different search methods.
6., Sequence alignment algorithms
The concept of local, global alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types.
7., Multiple alignments, heuristic sequence alignment algorithms.
Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST (sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees. (IG)
8, From genes to proteins: finding protein coding genes (GV) Transcription and translation. CDS, ORF, gene finding.
9, AI and Bioinformatics. Markov models. Hidden Markov models. The forward algorithm. Viterbi algorithm. The Viterbi learning algorithm.
Part 3: Structure prediction
10, Molecular structure primer; Molecular structure prediction
11, Drug-protein and protein-protein docking (GV)
Part 4: Interaction networks
12, Molecular networks: metabolic and physical interaction networks
Source of information: online databases (DIP, MINT, Intact, HPRD)
13, Molecular networks: gene regulatory and cell signaling networks
14, Protein function prediction and similarity
Part 5: Brain informatics
MRI data processing, brain graphs, brain graph analysis