Bioinformatics Course

Course in theory	Laboratory practice
1, What is covered in this course, and why? Brief introduction. Genome sequencing techniques: Sanger, 454/Roche, Illumina/Solexa The FASTA and FASTQ formats. Introduction to metagenomics. Microbial diversity.	1, Linux basics, commands, piping, archiving, updating.
Part 1: Genome sequencing 2, Levels of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace. Re-sequencing vs. de novo sequencing. Assembly of short reads. Strategies: Hashing. Hashing in parallel (example with the element distinctness problem).	2, Shell scripts, compilation, ssh tunneling,
3, The Burrows-Wheeler transform. Easy substring-search, computing its inverse. Sequence assembly with graphs. The Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs	3, Python script language The AMPHORA2 sequence phylotyping software
2., Genome sequencing: Related problems in bioinformatics: Physical mapping. Physical mapping with backtrack algorithm. Physical mapping using hybridization (algorithm sketch: PQ-trees)	4, Python continued
3., Shotgun sequencing. Motivation, calculation of sequence overlaps (later: with suffix trees). Greedy algorithm.	5, Sequencing, sequence alignments
Part 2: Sequence analysis 4., Strings, basic problems related to string search: pattern matching, alignments.Brief introduction on data structures. Two approaches for pattern matching: preprocessing the text or the searched pattern.Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs. Dynamic programming: main ideas and applicability to sequence comparisons. "Scoring functions" and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints	6, Sequencing, sequence alignment
5., Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees and suffix arrays. Quickly solvable tasks using Suffix trees. Efficient construction of suffix arrays: radix quicksort, (sketch: Sadakane-algorithm, linear method). Live demonstration of comparing the different search methods.	7, Sequencing, sequence alignment
6., Sequence alignment algorithms The concept of local, global, "glocal" alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types. Motivation behind scoring functions: statistical considerations. Sequence alignment: statistical parameters and their interpretations. Live demonstration of the different alignment algorithms.	TBA
7., Multiple alignments, heuristic sequence alignment algorithms. Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST (sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees.	TBA
8,g From genes to proteins: finding protein coding genes Transcription and translation. CDS, ORF, gene finding.	8, System administration for beginners
9, AI and Bioinformatics. Markov models. Hidden Markov models. The forward algorithm. Viterbi algorithm. The Viterbi learning algorithm.
Part 3: Structure prediction 10, Molecular structure primer; Molecular structure prediction
11, Drug-protein and protein-protein docking
Part 4: Interaction networks 12, Molecular networks: metabolic and physical interaction networks Source of information: online databases (DIP, MINT, Intact, HPRD)	12. PPI download and update
13, Molecular networks: gene regulatory and cell signaling networks	13, PPI generation
14, Protein function prediction and similarity	14, PPI analysis

Bioinformatics course