Bioinformatics course

A good introduction: Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667.


For metagenomics: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (The National Academies Press, 2007, FREE)

For general bioinformatics:  Lengauer: Bioinformatics - from genomes to therapies.. Vol.1-3 (Wiley, 2007)

Isaev A. Introduction to mathematical methods in bioinformatics (Springer, 2006)

Metagenomics webservers:

MG-RAST        WebMGA     METAGENassist
Manuals: METAGENassist    

Handouts, course materials:

Public links and references: course materials:

For lecture 8: Chapter 5 of the Lengauer-book: David C. Kulp: Finding Protein-coding Genes

For lecture 9: Rabiner's tutorial: ,  Chapter 3 of Isaev's book.

In-house electronic handouts for our students  (this is a password protected area, passwords are circulated in the class).

(this is the syllabus of the last year's course, now we will  emphasize next generation sequencing and metagenomics more, and  protein interaction networks less.)

Course in theory

Laboratory practice

1, What is covered in this course, and why?

Brief introduction. Genome sequencing techniques: Sanger, 454/Roche, Illumina/Solexa  The FASTA and FASTQ formats.  Introduction to metagenomics.  Microbial diversity.

1, Linux basics, commands, piping, archiving, updating.

Part 1: Genome sequencing

2,  Levels of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace.  Re-sequencing vs. de novo sequencing. Assembly of short reads.

Strategies: Hashing. Hashing in parallel (example with the element distinctness problem).

2, Shell scripts, compilation, ssh tunneling,

 3, The Burrows-Wheeler transform. Easy substring-search, computing its inverse. Sequence assembly with graphs. The Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs

3, Python script language
The AMPHORA2 sequence phylotyping software

 2., Genome sequencing: Related problems in bioinformatics: Physical mapping. Physical mapping with backtrack algorithm.
Physical mapping using hybridization (algorithm sketch: PQ-trees)


4, Python continued

3., Shotgun sequencing. Motivation, calculation of sequence overlaps (later: with suffix trees). Greedy algorithm. 

5, Sequencing, sequence alignments

Part 2: Sequence analysis

4., Strings, basic problems related to string search: pattern matching, alignments.Brief introduction on data structures.

Two approaches for pattern matching: preprocessing the text or the searched pattern.Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs.

Dynamic programming: main ideas and applicability to sequence comparisons. "Scoring functions" and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints 

6, Sequencing, sequence alignment

5., Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees and suffix arrays. Quickly solvable tasks using Suffix trees.
Efficient construction of suffix arrays: radix quicksort, (sketch: Sadakane-algorithm, linear method). Live demonstration of comparing the different search methods.

7, Sequencing, sequence alignment

6., Sequence alignment algorithms

The concept of local, global, "glocal" alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types. Motivation behind scoring functions: statistical considerations. Sequence alignment: statistical parameters and their interpretations. Live demonstration of the different alignment algorithms. 


7., Multiple alignments, heuristic sequence alignment algorithms.

Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST (sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees. 



8,g From genes to proteins: finding protein coding genes  Transcription and translation. CDS, ORF, gene finding.


8,  System administration for beginners

9, AI and Bioinformatics. Markov models. Hidden Markov models. The forward algorithm. Viterbi algorithm. The Viterbi learning algorithm. 


Part 3: Structure prediction              

10, Molecular structure primer; Molecular structure prediction



11, Drug-protein and protein-protein docking 


Part 4: Interaction networks

12, Molecular networks: metabolic and physical interaction networks

Source of information: online databases (DIP, MINT, Intact, HPRD) 

12. PPI download and update

13, Molecular networks: gene regulatory and cell signaling networks

13, PPI generation


14, Protein function prediction and similarity


14, PPI analysis