Bioinformatics course

Theory:   Tuesday, 12:00, 3-114, South building

Practice: http://pitgroup.org/bioinfogyak/

A good introduction: Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667.
Textbooks:

For metagenomics: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (The National Academies Press, 2007, FREE)

For general bioinformatics: Lengauer: Bioinformatics – from genomes to therapies.. Vol.1-3 (Wiley, 2007)

Isaev A. Introduction to mathematical methods in bioinformatics (Springer, 2006)

Metagenomics webservers:

MG-RAST WebMGA METAGENassist
Manuals: METAGENassist

Handouts, course materials:

Public links and references: course materials:

For lecture 8: Chapter 5 of the Lengauer-book: David C. Kulp: Finding Protein-coding Genes

For lecture 9: Rabiner’s tutorial: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf , Chapter 3 of Isaev’s book.

In-house electronic handouts for our students (this is a password protected area, passwords are circulated in the class).

Syllabus:
1, What is covered in this course, and why? Brief introduction. Genome sequencing techniques: Sanger, 454/Roche, Illumina/Solexa The FASTA and FASTQ formats. Introduction to metagenomics. Microbial diversity.

Part 1: Genome sequencing

2, Levels of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace. Re-sequencing vs. de novo sequencing. Assembly of short reads.

Strategies: Hashing. Hashing in parallel (example with the element distinctness problem).

3, The Burrows-Wheeler transform. Easy substring-search, computing its inverse. Sequence assembly with graphs. The Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs

Part 2: Sequence analysis

4., Strings, basic problems related to string search: pattern matching, alignments.Brief introduction on data structures.

Two approaches for pattern matching: preprocessing the text or the searched pattern.Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs.

Dynamic programming: main ideas and applicability to sequence comparisons. “Scoring functions” and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints

5., Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees and suffix arrays. Quickly solvable tasks using Suffix trees.
Efficient construction of suffix arrays: radix quicksort, (sketch: Sadakane-algorithm, linear method). Live demonstration of comparing the different search methods.

6., Sequence alignment algorithms

The concept of local, global alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types.

7., Multiple alignments, heuristic sequence alignment algorithms.

Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST (sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees. (IG)

8, From genes to proteins: finding protein coding genes (GV) Transcription and translation. CDS, ORF, gene finding.

9, AI and Bioinformatics. Markov models. Hidden Markov models. The forward algorithm. Viterbi algorithm. The Viterbi learning algorithm.

Part 3: Structure prediction

 

10, Molecular structure primer; Molecular structure prediction

11, Drug-protein and protein-protein docking (GV)

Part 4: Interaction networks

12, Molecular networks: metabolic and physical interaction networks

Source of information: online databases (DIP, MINT, Intact, HPRD)

13, Molecular networks: gene regulatory and cell signaling networks

14, Protein function prediction and similarity

Part 5: Brain informatics

MRI data processing, brain graphs, brain graph analysis